Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Your Node.JS API Like a Boss

volkan
March 08, 2016

Scaling Your Node.JS API Like a Boss

________

Video of the Presentation Is Available Here »»
https://www.youtube.com/watch?v=Ogjb60Fg10A
________

It’s one thing to create a sample RESTful API using Node.js (maybe utilizing the cluster module to distribute the load), but it’s quite another to horizontally scale your architecture to hundreds of thousands of concurrent connections while trying to ensure redundancy and high availability. Knowing how to scale is important, but more important than that is knowing when to scale.

Volkan Özçelik explores what it takes to create a real-life, scalable, highly available, and highly responsive Node.js application. Volkan will also explain how to store the application state in its own cluster and why it matters.

Volkan outlines how to choose the container architecture for your (virtual) machines, how you can roll out updates to service without disrupting the users, and how you fail gracefully when things on a node go haywire. He also covers tracking down memory leaks and coming up with short-term (i.e., restarting your nodes when they become too beefy) and long-term (i.e., actually spotting where the leaks are and fixing them) solutions to address them.

When you dive deeper and deeper into the rabbit hole, you soon realize that scalability is a tough job that requires careful planning and consideration. The bottom line is that designing any system to scale is a never-ending adventure, and there is no limit on how deep you can dive.

volkan

March 08, 2016
Tweet

More Decks by volkan

Other Decks in Technology

Transcript

  1. NODE JS API your …like a boss Scaling http://bit.ly/nodejs-rocks Volkan

    Özçelik March, 7, 2016 http://volkan.io/ @linkibol v0lkan
  2. About Me • Volkan Özçelik — JavaScript Lover & Performance

    Freak • Current: • Technical Lead @ Cisco • Before: • Mobile Frontend Engineer @ Jive Software • VP of Technology @ grou.ps (now GymGroups) • CTO @ cember.net (acquired by Xing )
 • Chase Me:
 @linkibol
 
 v0lkan
  3. Agenda • Node’s Strengths and Weaknesses • Tweaking Our OS

    • Throughput, Concurrency, Latency • Scale a Real-Life Node App
  4. How do I Architect 
 a Scalable and Consistent Node.JS

    API 
 with Manageable Complexity? In a Nutshell…
  5. How do I Architect 
 a Scalable and Consistent Node.JS

    API 
 with Manageable Complexity? In a Nutshell…
  6. How do I Architect 
 a Scalable and Consistent Node.JS

    API 
 with Manageable Complexity? In a Nutshell…
  7. Don’t Fight Windmills • Keep things simpler. • Build something

    that’s good enough for your purpose. • Solve for the problems that are actually on your plate.
  8. • Monitor All The Things • Collect Metrics • Form

    a Hypothesis • Gather Evidence • Validate Your Hypothesis • Take Corrective Action If Needed Don’t Invent Problems That You Don’t Have (Yet)
  9. Goals • Minimize Client Response Time • Maximize Resource Efficiency

    on the Server
 Hint: Leave 50% of the memory unused
 (for taking core dumps)
  10. High-Level Topology of an API Service API Service Load Balancer

    SSL Termination Load Balancing API Gateway Authentication Authorization Token Exchange Rate Limiting … HTTP Proxy Clients
  11. High-Level Topology of an API Service API Service Load Balancer

    SSL Termination Load Balancing API Gateway Authentication Authorization Token Exchange Rate Limiting … HTTP Proxy Clients
  12. Show Love to Functions • Accept JavaScript’s functional and composable

    nature. • Avoid `this` and avoid `new` — You’ll thank me later. • Create Focused, Independent, Reusable, and Testable Modules.
  13. “OO leads to anger; Anger leads to hate; Hate leads

    to suffering!” Embrace the Difference
  14. Node.JS Is Perfect For… • IO-Heavy Applications • Data-Intensive Realtime

    Apps • RESTful / API-Driven (Micro)services • Streams • Queued (Lazy) Writes • Processing data on-the-fly https://github.com/libuv/libuv
  15. Node.JS Is not For… • Serving Static Files • CPU-bound

    Applications • Creating a Monolithic Infrastructure
  16. Node.JS is not a Swiss Army Knife • Load Balancing

    ➡ haproxy | NGINX | ELB 
 ( http://www.haproxy.org/ | http://nginx.org/ | http://aws.amazon.com ) • SSL Termination ➡ stud ( https://github.com/bumptech/stud ) • GZIP Compression ➡ NGINX | haproxy • Serving Static Assets ➡ CDN | NGINX | Varnish ( https://www.varnish-cache.org/ )
  17. Know Your Bottlenecks • Node.JS serves really well as a

    highly concurrent networking app. • Node.JS is very sensitive to memory leaks and blocking code. • 99% of the time you will be IO-bound.
  18. Know the Ecosystem • Do Not Ignore The Ecosystem •

    Follow Community News and Updates • Attend to Conferences (like this one) • Know Your Tools and Use Them
  19. Even More Tweaks Do NOT alter anything that you don’t

    know! * See 
 https://www.frozentux.net/ipsysctl-tutorial/ipsysctl-tutorial.html and http://www.tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/ 
 for more info.
  20. Common Threats • XSS / CSRF • Input Validation Attack

    • DoS / ReDoS • Request Size * Securing Node.JS is not different from securing any other web app. See also: https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project
  21. Do Not Run Node.JS As Root useradd -mrU web
 mkdir

    /opt/web-app
 chown web /opt/web-app
 cd /opt/web-app
 su web
 node app.js
 firewall-cmd --permanent --zone=public --add-port=3000/tcp Also, always run Node.JS behind a reverse proxy!
  22. restify http tcp ab -n 10000 -c 100 http://app:8000/hello containers/000-simple-app-restify

    containers/001-simple-app-http containers/002-simple-app-tcp
  23. ab -n 10000 -c 100 http://app:8000/hello Tested on MacBook Pro,

    2.4 GHz Intel Core i5, 16 GB 1600 MHz DDR3 Going Bare Bones
  24. Is It Worth It? • You Can Go Bare-Bones for

    Maximum Throughput • Tradeoff: • Harder to maintain • More complex code • Error prone • Lots of edge cases • Harder to use additional tooling
  25. Distributed Load Testing Toolbox • Apps • jMeter: http://jmeter.apache.org •

    Gatling: http://gatling.io/#/ • The Grinder: 
 http://grinder.sourceforge.net • Locust: 
 https://github.com/locustio/locust • “as a service” • flood.io: https://flood.io • loader.io: http://loader.io • LoadImpact: https://loadimpact.com • BlazeMeter: https://www.blazemeter.com • LoadStorm: http://loadstorm.com
  26. Lessons Learned • Latency Kills • Know Your Platform &

    Know Your Tools • For maximum throughput go bare bones • Tradeoff: Giving up all the benefits a framework has to offer • Low-level code is harder to maintain: • Harder to Test and Verify / Easier to Create Bugs and Regressions • Corollary: As you add additional layers of abstractions, your API will marginally slow down. • The Inception Rule: More than three levels and you’re lost forever!
  27. Perf Before Scale • Rule #1: 
 Avoid premature optimization.


    Do measurements, and optimize what matters. • Tweak Your System for High Performance • Cache All The Things • Cache at every level. • The fastest API response is no response at all. • Delegate Long-Running/CPU-Intensive(*) Operations • Be Lazy Whenever Possible
  28. Things to Watch Out For • Always Keep an Eye

    on the Event Loop • Your API Service may Become CPU-Bound • External API Calls Can Be a Bottleneck • Track Heap Usage Over Time • Implement Sanity Checks • Implement Circuit Breakers • Have an Upper Bound for Concurrency
  29. Things to Watch Out For • Is the app running

    and functional? • Is the app overloaded? • How many errors have been raised so far? • Is the app performant (throughput, memory utilization, concurrency)? • Is my cluster healthy? • How many times do forks restored? • Are all clustered forks alive and okay?
  30. Which Will (most of the time) Boil Down to… •

    Watching Response Times • Watching CPU Utilization + General Sys Resource Usage • Watching Number of Concurrent Connections
  31. Types of Compilers in v8 • Generic Compiler • Optimizing

    Compiler (Crankshaft) • Can Be Two or More Orders of Magnitude Faster See also: * https://wingolog.org/archives/2011/07/05/v8-a-tale-of-two-compilers * http://thibaultlaurens.github.io/javascript/2013/04/29/how-the-v8-engine-works/ * http://www.html5rocks.com/en/tutorials/speed/v8/
  32. X-Ray View Into the v8 Compiler node --trace_opt 
 --trace_deopt

    
 --allow-natives-syntax test.js; • console.log(%HasFastProperties(obj)) • console.log(%GetOptimizationStatus(fn)) https://github.com/Nathanaela/v8-natives
  33. X-Ray View Into the v8 Compiler node --trace_opt 
 --trace_deopt

    
 --allow-natives-syntax test.js; • console.log(%HasFastProperties(obj)) • console.log(%GetOptimizationStatus(fn)) https://github.com/Nathanaela/v8-natives
  34. v8 Optimization Killers • Using debugger anywhere within the function.

    • Using eval anywhere within the function. • Using with anywhere within the function. • Using try/catch anywhere within the function. * ~via https://github.com/petkaantonov/bluebird/wiki/Optimization-killers
  35. Let’s Create Something Real • An API that… • Auto-suggest

    tags, given a url • Lists related URLs, given a tag
  36. API Service Internet Bastion Simulated by an NGINX static web

    server * Fetch HTML off of websites * Simplify and convert the HTML to plain text * Do NLP/Tokenization on the plain text * Create tags as a result test API Initial Topology
  37. Findings • get-tags appear to be CPU-bound. • When get-tags

    is being requested, the performance of get-urls becomes two orders of magnitude slower. • get-urls appears to be pretty fast, and it is not CPU bound.
  38. How Can We Be Sure? • Add probes (DTrace, XTrace…

    etc) 
 to trace what’s happening. • Create a REPL to check the app at runtime. containers/004-demo-w-instrumentation
  39. Creating a REPL • You Can Expose Internal State via

    an API and/or a CLI/REPL • vantage: https://github.com/dthree/vantage • kang: https://github.com/davepacheco/kang • repl server: https://nodejs.org/api/repl.html • Expose Additional Logging Info at Runtime (in systems that support it) • bunyan -p ( https://github.com/trentm/node-bunyan )
  40. Monitoring Toolbox • Runtime Performance Probing (Kernel-Level Tools) • Linux

    Perf Events ( https://perf.wiki.kernel.org/index.php/Main_Page ) perf record -F 71 -p `pgrep -n node` -g -- sleep 30 node --perf_basic_prof_only_functions • Dtrace ( http://dtrace.org/blogs/about/ ) • Tracking Transactions and Tracing Latency • Zipkin ( https://github.com/openzipkin/zipkin ) • Runtime Memory Usage (heap stats, heap diffing, leak detection) • Memwatch ( https://github.com/lloyd/node-memwatch ) • See http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection for details.
  41. Monitoring Toolbox • Runtime Performance Probing (Kernel-Level Tools) • Linux

    Perf Events ( https://perf.wiki.kernel.org/index.php/Main_Page ) perf record -F 71 -p `pgrep -n node` -g -- sleep 30 node --perf_basic_prof_only_functions • Dtrace ( http://dtrace.org/blogs/about/ ) • Tracking Transactions and Tracing Latency • Zipkin ( https://github.com/openzipkin/zipkin ) • Runtime Memory Usage (heap stats, heap diffing, leak detection) • Memwatch ( https://github.com/lloyd/node-memwatch ) • See http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection for details.
  42. Monitoring Toolbox • Runtime Performance Probing (Kernel-Level Tools) • Linux

    Perf Events ( https://perf.wiki.kernel.org/index.php/Main_Page ) perf record -F 71 -p `pgrep -n node` -g -- sleep 30 node --perf_basic_prof_only_functions • Dtrace ( http://dtrace.org/blogs/about/ ) • Tracking Transactions and Tracing Latency • Zipkin ( https://github.com/openzipkin/zipkin ) • Runtime Memory Usage (heap stats, heap diffing, leak detection) • Memwatch ( https://github.com/lloyd/node-memwatch ) • See http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection for details.
  43. Monitoring Toolbox • Monitoring “as a service” • nodetime https://nodetime.com/

    • newrelic http://newrelic.com/nodejs • strongloop https://strongloop.com/node-js/performance-monitoring/ • keymetrics https://keymetrics.io/ • appdynamics https://www.appdynamics.com/nodejs/ • …
  44. So… Something Is CPU-Intensive • get-urls is CPU-bound and it

    also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first.
  45. So… Something Is CPU-Intensive • get-urls is CPU-bound and it

    also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process
  46. So… Something Is CPU-Intensive • get-urls is CPU-bound and it

    also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process
  47. So… Something Is CPU-Intensive • get-urls is CPU-bound and it

    also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process
  48. So… Something Is CPU-Intensive • get-urls is CPU-bound and it

    also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process
  49. So… Something Is CPU-Intensive • get-urls is CPU-bound and it

    also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process
  50. Split App and Compute Nodes Compute Service API Service Message

    Bus rabbitmq, zeromq, resque etc. see also http://queues.io/ * * containers/005-demo-split-compute
  51. Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue

    Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue
  52. Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue

    Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue
  53. Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue

    Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue
  54. Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue

    Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue
  55. Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue

    Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue
  56. Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue

    Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue
  57. Aggregate and Rotate Your Log Files Log Aggregator Compute Service

    memory API Service memory Message Bus containers/006-demo-eventbus-logaggr
  58. Aggregate and Rotate Your Log Files Log Aggregator Compute Service

    memory API Service memory Message Bus containers/006-demo-eventbus-logaggr
  59. Aggregate and Rotate Your Log Files Log Aggregator Compute Service

    memory API Service memory Message Bus containers/006-demo-eventbus-logaggr
  60. Use a Decent Logger • Bunyan ( https://github.com/trentm/node-bunyan ) •

    Winston ( https://github.com/winstonjs/winston ) • Log4JS ( https://github.com/nomiddlename/log4js-node )
  61. What to Log • Authentication & Authorization • Session Management

    • Method Entry Points • Errors and Weird Events • Specific Events (startup, shutdown, slowdown etc.) • High-Risk Functionalities (payments, privileges, admins etc)
  62. Log Analysis Toolbox • Loggly ( https://www.loggly.com/ ) • ELK

    Stack ( https://www.elastic.co/products ) • Nagios Log Server ( https://www.nagios.com/products/nagios-log-server/ ) • Splunk ( http://www.splunk.com/en_us/homepage.html ) • …
  63. Processes Die Accept it No system is %100 resilient. Every

    crash is important. Every Exception is Important Too: Adopt a “Zero Exception Policy”
  64. Keep It Running •forever ( https://github.com/foreverjs/forever ) •pm2 ( https://github.com/Unitech/pm2

    ) •upstart ( http://upstart.ubuntu.com/ ) •systemd ( https://www.wikiwand.com/en/Systemd )
  65. Node.JS Debugging Myths • Debugging and Profiling in Node.JS is

    Hard • Debugging and Profiling in Node.JS is Immature • You Cannot Debug or Profile a Live Production Node.JS App
  66. Debugging • Live Debugging (using a REPL) • Remote Debugging

    
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)
  67. Debugging • Live Debugging (using a REPL) • Remote Debugging

    
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)
  68. Debugging • Live Debugging (using a REPL) • Remote Debugging

    
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)
  69. Debugging • Live Debugging (using a REPL) • Remote Debugging

    
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)
  70. Debugging • Live Debugging (using a REPL) • Remote Debugging

    
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)
  71. Flame Graphs & Core Dumps • Core Dumps • Can

    Be Created When Node.JS Crashes ( --abort_on_uncaught_exception ) • Can Be Created at Runtime ( using gcore * ) • Flame Graphs • You Can Use dtrace + stackvis to generate them ** • You Can Use perf events + Flame Graphs Tool to generate them *** http://man7.org/linux/man-pages/man1/gcore.1.html * http://blog.nodejs.org/2012/04/25/profiling-node-js/ ** http://yunong.io/2015/11/23/generating-node-js-flame-graphs/ ***
  72. Debugging (Profiling) • Use Kernel Level Tools • DTrace (Solaris,

    BSD), perf (Linux), and XPerf (Windows) • Can be used in production • Use the v8 Profiler • Not quite suitable for production
  73. v8 Profiler • node --v8-options | grep gc — node

    --v8-options | grep '\-\-trace' • `node --perf_basic_prof_only_functions .` => for perf events (new in Node 5) • `node --expose_gc --trace_gc --trace_gc_object_stats 
 --trace_gc_verbose --gc_global .` => traces to the console • `node --prof --log_timer_events --track_gc_object_stats
 --log_internal-timer_events --no-use-inlining .` => creates a perf log file * See also: http://www.chromium.org/developers/creating-v8-profiling-timeline-plots
  74. Help the Debugger • Always Name Your Functions • Don’t

    let the errors go unhandled. • Emit “error” events instead of throwing exceptions. • Use an error library: • https://github.com/davepacheco/node-verror • Put a descriptive message before raising an error.
  75. Help the Debugger • Always Name Your Functions • Don’t

    let the errors go unhandled. • Emit “error” events instead of throwing exceptions. • Use an error library: • https://github.com/davepacheco/node-verror • Put a descriptive message before raising an error.
  76. Use a Private NPM Log Aggregator Compute Service memory API

    Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia
  77. Use a Private NPM Log Aggregator Compute Service memory API

    Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia
  78. Use a Private NPM Log Aggregator Compute Service memory API

    Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia
  79. Use a Private NPM Log Aggregator Compute Service memory API

    Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia
  80. Use a Private NPM Log Aggregator Compute Service memory API

    Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia
  81. Use a Private NPM Log Aggregator Compute Service memory API

    Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia
  82. Use a Private NPM Log Aggregator Compute Service memory API

    Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia
  83. Use a Private NPM Log Aggregator Compute Service memory API

    Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia
  84. Use a Private NPM • Promotes modularization and code re-use.

    • Modules are cached, hence faster to install. • You can continue your work, even when public registry goes offline. • Makes refactoring and testing easier. • No more “../../../..”s!
  85. Clustering containers/010-cluster * See http://docs.libuv.org/en/v1.x/threadpool.html https://nikhilm.github.io/uvbook/processes.html https://nikhilm.github.io/uvbook/threads.html for how the

    dark magic works internally. * See also https://strongloop.com/strongblog/whats-new-in-node-js-v0-12-cluster-round-robin-load-balancing/ for how the load balancing between processes in the cluster module evolved over time;
 and see https://github.com/nodejs/node-v0.x-archive/commit/e72cd41 
 for the Round-Robin cluster load balancing algorithm.
  86. Clustering app memory app app app * See https://strongloop.com/strongblog/whats-new-in-node-js-v0-12-cluster-round-robin-load-balancing/ how

    the load balancing between processes in the cluster module evolved over time;
 and see https://github.com/nodejs/node-v0.x-archive/commit/e72cd41 
 for the Round-Robin cluster load balancing algorithm.
  87. How Many Workers Per VM? two to four cores per

    VM is an ideal balance m a s t e r child_process child_process child_process child_process
  88. Is Bigger Always Better? OR you can use lightweight single-CPU

    containers and a LB in lieu of clustering lightweight container lightweight container Load Balancer lightweight container lightweight container <-single core <-single core <-single core <-single core
  89. Cluster The Services VM 2 Compute Service Compute Service cluster

    API Service API Service cluster VM 1 Message Bus
  90. Circuit Breaker closed fail (under threshold) open fail (reached threshold)

    checking… timer (exponential backoff) fail success See http://www.amazon.com/gp/product/0978739213 and http://martinfowler.com/bliki/CircuitBreaker.html (503: Server Busy) (200: OK)
  91. Circuit Breaker * This is a simplified example, and it

    does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.
  92. Circuit Breaker * This is a simplified example, and it

    does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.
  93. Circuit Breaker * This is a simplified example, and it

    does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.
  94. Circuit Breaker * This is a simplified example, and it

    does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.
  95. Circuit Breaker * This is a simplified example, and it

    does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.
  96. Circuit Breaker * This is a simplified example, and it

    does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.
  97. Circuit Breaker • Can be used with any kind of

    metric. • You can use to “rate limit” your API. • Useful when you depend on other APIs that might fail.
  98. Where Were We? VM 2 Compute Service Compute Service cluster

    API Service API Service cluster VM 1 Message Bus
  99. VM 2 Compute Service Compute Service cluster API Service API

    Service cluster VM 1 Message Bus memory memory Are We Missing Something?
  100. VM 2 Compute Service Compute Service cluster API Service API

    Service cluster VM 1 Message Bus memory memory Are We Missing Something?
  101. VM 2 Compute Service Compute Service cluster API Service API

    Service cluster VM 1 Message Bus memory memory Are We Missing Something?
  102. Move the State Information Out VM 2 Compute Service Compute

    Service cluster redis API Service API Service redis cluster VM 1 Message Bus containers/011-sharing-memory • Use redis to solve session affinity. • Use token-based authentication with JWT to handle authentication 
 ( https://scotch.io/tutorials/the-ins-and-outs-of-token-based-authentication ).
  103. Move the State Information Out VM 2 Compute Service Compute

    Service cluster redis API Service API Service redis cluster VM 1 Message Bus containers/011-sharing-memory • Use redis to solve session affinity. • Use token-based authentication with JWT to handle authentication 
 ( https://scotch.io/tutorials/the-ins-and-outs-of-token-based-authentication ).
  104. Move the State Information Out VM 2 Compute Service Compute

    Service cluster redis API Service API Service redis cluster VM 1 Message Bus containers/011-sharing-memory • Use redis to solve session affinity. • Use token-based authentication with JWT to handle authentication 
 ( https://scotch.io/tutorials/the-ins-and-outs-of-token-based-authentication ).
  105. Add a Load-Balancer Compute Service Compute Service cluster redis API

    Service API Service redis cluster Compute Service Compute Service cluster API Service API Service cluster Load Balancer Message Bus containers/012-bounce
  106. Add a Load-Balancer Compute Service Compute Service cluster redis API

    Service API Service redis cluster Compute Service Compute Service cluster API Service API Service cluster Load Balancer Message Bus containers/012-bounce
  107. Add AutoScale Rules autoscale groups Compute Service Compute Service cluster

    redis API Service API Service redis cluster Compute Service Compute Service cluster API Service API Service cluster Load Balancer Message Bus
  108. Load Balancing Options • Load Balancing as a Service (AWS,

    Rackspace…) • Hardware Load Balancer (Cisco CEF, Barracuda, etc…) • Software Load Balancer • NGINX • HAProxy • home grown
  109. Wait! Aren’t These Actually Microservices? API app compute app worker

    worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process broker load balancer Internet message bus redis redis … …
  110. API app compute app worker worker worker child_process API app

    c l u s t e r c l u s t e r compute app worker worker worker child_process API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process broker load balancer Internet message bus redis redis … … API μ-Service Wait! Aren’t These Actually Microservices?
  111. API app compute app worker worker worker child_process API app

    c l u s t e r c l u s t e r compute app worker worker worker child_process API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process broker load balancer Internet message bus redis redis … … API μ-Service Compute μ-Service Wait! Aren’t These Actually Microservices? * See Also: http://martinfowler.com/articles/microservice-trade-offs.html http://highscalability.com/blog/2014/4/8/microservices-not-a-free-lunch.html https://rclayton.silvrback.com/failing-at-microservices
  112. That Means You’ve Become Famous Scalability Will Be the Least

    of Your Concerns What If I Reach The Scalability Limits Within a Region?
  113. Multiple Regions Region 1 Compute Service Compute Service redis API

    Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin
  114. Multiple Regions Region 1 Compute Service Compute Service redis API

    Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin
  115. Multiple Regions Region 1 Compute Service Compute Service redis API

    Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin
  116. Multiple Regions Region 1 Compute Service Compute Service redis API

    Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin
  117. Multiple Regions Region 1 Compute Service Compute Service redis API

    Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin
  118. Multiple Regions Load Balancer Load Balancer Region 1 Compute Service

    Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  119. Multiple Regions Load Balancer Load Balancer Region 1 Compute Service

    Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  120. Multiple Regions Load Balancer Load Balancer Region 1 Compute Service

    Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  121. You Can Add More Region 1 Compute AutoScale Group API

    AutoScale Group LB LB DNS The Internet Message Bus Region 2 Compute AutoScale Group API AutoScale Group Message Bus … Region N Compute AutoScale Group API AutoScale Group Message Bus LB
  122. How Do I Manage All This Infrastructure? This is Getting

    Out of Hand! Region 1 Compute AutoScale Group API AutoScale Group LB LB DNS The Internet Message Bus Region 2 Compute AutoScale Group API AutoScale Group Message Bus … Region N Compute AutoScale Group API AutoScale Group Message Bus LB
  123. • No Hard-Coded IP Addresses in Config Files • Let

    DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips
  124. • No Hard-Coded IP Addresses in Config Files • Let

    DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips
  125. • No Hard-Coded IP Addresses in Config Files • Let

    DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips
  126. • No Hard-Coded IP Addresses in Config Files • Let

    DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips
  127. • No Hard-Coded IP Addresses in Config Files • Let

    DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips
  128. • No Hard-Coded IP Addresses in Config Files • Let

    DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips
  129. • No Hard-Coded IP Addresses in Config Files • Let

    DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips
  130. CI / CD • Use a CI / CD Pipeline

    • Show Love to Test-Driven Development • Don’t Forget Functional Tests and Integration Tests
  131. Continuously Keep Your Code In Ship Shape • ESLint (

    http://eslint.org ) • CodeClimate ( https://codeclimate.com/features ) • GreenKeeper ( http://greenkeeper.io ) • npm scripts (instead of Grunt or Gulp — YMMV)
 ( https://docs.npmjs.com/misc/scripts ) • npm outdated ( https://docs.npmjs.com/cli/outdated ) • git pre-commit hooks ( https://github.com/observing/pre-commit ) • [ hint: Install your development dependencies (such as eslint, babel, gulp, etc) locally, (not globally)! ]
  132. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  133. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  134. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  135. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  136. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  137. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  138. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  139. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  140. Are We Done Yet? Load Balancer Load Balancer Region 1

    Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus
  141. Making the Load Balancer HA * see also: https://www.wikiwand.com/en/Virtual_Router_Redundancy_Protocol Load

    Balancer client Load Balancer Load Balancer client keepalived active failover
  142. Making the Load Balancer HA * see also: https://www.wikiwand.com/en/Virtual_Router_Redundancy_Protocol Load

    Balancer client Load Balancer Load Balancer client keepalived active failover
  143. Making the Load Balancer Highly Available • round-robin DNS •

    https://www.wikiwand.com/en/Round-robin_DNS • heartbeat • https://www.wikiwand.com/en/Heartbeat_(computing) • keepalived • http://keepalived.org/ * You can use these tools to make any component HA.
  144. SSL Termination * * * https://github.com/bumptech/stud Load Balancer Load Balancer

    client keepalived active failover SSL Terminator SSL Terminator client keepalived active failover Load Balancer Load Balancer
  145. SSL Termination * * * https://github.com/bumptech/stud Load Balancer Load Balancer

    client keepalived active failover SSL Terminator SSL Terminator client keepalived active failover Load Balancer Load Balancer
  146. SSL Termination * * * https://github.com/bumptech/stud Load Balancer Load Balancer

    client keepalived active failover SSL Terminator SSL Terminator client keepalived active failover Load Balancer Load Balancer
  147. Make Redis and RabbitMQ Redundant redis redis (master) redis (read

    replica) redis (read replica) redis (read replica) redis (master) redis (read replica) redis (read replica) redis (read replica) round-robin DNS This will also increase throughput as a side benefit. See http://redis.io/topics/replication and http://redis.io/topics/ cluster-tutorial. You can also use a managed “memory as a service” solution. See also https://www.rabbitmq.com/ha.html for how a similar queue mirroring is implemented for a RabbitMQ cluster. 
 And similarly, you can use a managed “queue as a service” solution to ease your pain ;)
  148. Build Redundancy Everywhere Note This is more typically done by

    using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.
  149. Build Redundancy Everywhere Note This is more typically done by

    using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.
  150. Build Redundancy Everywhere Note This is more typically done by

    using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.
  151. Build Redundancy Everywhere Note This is more typically done by

    using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.
  152. Build Redundancy Everywhere Note This is more typically done by

    using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.
  153. Build Redundancy Everywhere Note This is more typically done by

    using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.
  154. Build Redundancy Everywhere Note This is more typically done by

    using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.
  155. Torture Your System • Try Chaos Monkey • https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey •

    Randomly send `kill -9` to Processes • Randomly Knock a Server Offline • Intentionally Run Out of Disk Space • Take an entire data center down
  156. Summary Stateless is Better than Stateful Eventual Consistency Build Redundancy

    Everywhere! Startup Fast, Shut Down Gracefully Solve Problems That Actually Exist
  157. Summary Never Assume, Always Measure Perf Before Scale Infrastructure is

    Code; Automate It! Keep Configuration Details in Environment Variables Show Love to DNS
  158. Scale 2 ∞ & 㱺 Region 1 Compute AutoScale Group

    API AutoScale Group LB LB DNS The Internet Message Bus Region 2 Compute AutoScale Group API AutoScale Group Message Bus … Region N Compute AutoScale Group API AutoScale Group Message Bus LB API Service Internet Bastion Simulated by an NGINX static web server * Fetch HTML off of websites * Simplify and convert the HTML to plain text * Do NLP/Tokenization on the plain text * Create tags as a result test API