Slide 1

Slide 1 text

NODE JS API your …like a boss Scaling http://bit.ly/nodejs-rocks Volkan Özçelik March, 7, 2016 http://volkan.io/ @linkibol v0lkan

Slide 2

Slide 2 text

API Scalability (in theory)

Slide 3

Slide 3 text

API Scalability (in practice) nothing is linear $#!% will eventually happen!

Slide 4

Slide 4 text

About Me • Volkan Özçelik — JavaScript Lover & Performance Freak • Current: • Technical Lead @ Cisco • Before: • Mobile Frontend Engineer @ Jive Software • VP of Technology @ grou.ps (now GymGroups) • CTO @ cember.net (acquired by Xing )
 • Chase Me:
 @linkibol
 
 v0lkan

Slide 5

Slide 5 text

volkan.io Slides & Source Code

Slide 6

Slide 6 text

Agenda • Node’s Strengths and Weaknesses • Tweaking Our OS • Throughput, Concurrency, Latency • Scale a Real-Life Node App

Slide 7

Slide 7 text

How do I Architect 
 a Scalable and Consistent Node.JS API 
 with Manageable Complexity? In a Nutshell…

Slide 8

Slide 8 text

How do I Architect 
 a Scalable and Consistent Node.JS API 
 with Manageable Complexity? In a Nutshell…

Slide 9

Slide 9 text

How do I Architect 
 a Scalable and Consistent Node.JS API 
 with Manageable Complexity? In a Nutshell…

Slide 10

Slide 10 text

Don’t Fight Windmills • Keep things simpler. • Build something that’s good enough for your purpose. • Solve for the problems that are actually on your plate.

Slide 11

Slide 11 text

• Monitor All The Things • Collect Metrics • Form a Hypothesis • Gather Evidence • Validate Your Hypothesis • Take Corrective Action If Needed Don’t Invent Problems That You Don’t Have (Yet)

Slide 12

Slide 12 text

Goals • Minimize Client Response Time • Maximize Resource Efficiency on the Server
 Hint: Leave 50% of the memory unused
 (for taking core dumps)

Slide 13

Slide 13 text

High-Level Topology of an API Service API Service Load Balancer SSL Termination Load Balancing API Gateway Authentication Authorization Token Exchange Rate Limiting … HTTP Proxy Clients

Slide 14

Slide 14 text

High-Level Topology of an API Service API Service Load Balancer SSL Termination Load Balancing API Gateway Authentication Authorization Token Exchange Rate Limiting … HTTP Proxy Clients

Slide 15

Slide 15 text

So… JavaScript? JavaScript

Slide 16

Slide 16 text

Show Love to Functions • Accept JavaScript’s functional and composable nature. • Avoid `this` and avoid `new` — You’ll thank me later. • Create Focused, Independent, Reusable, and Testable Modules.

Slide 17

Slide 17 text

“OO leads to anger; Anger leads to hate; Hate leads to suffering!” Embrace the Difference

Slide 18

Slide 18 text

Know Your Platform

Slide 19

Slide 19 text

Node.JS Is Perfect For… • IO-Heavy Applications • Data-Intensive Realtime Apps • RESTful / API-Driven (Micro)services • Streams • Queued (Lazy) Writes • Processing data on-the-fly https://github.com/libuv/libuv

Slide 20

Slide 20 text

Node.JS Is not For… • Serving Static Files • CPU-bound Applications • Creating a Monolithic Infrastructure

Slide 21

Slide 21 text

Node.JS is not a Swiss Army Knife • Load Balancing ➡ haproxy | NGINX | ELB 
 ( http://www.haproxy.org/ | http://nginx.org/ | http://aws.amazon.com ) • SSL Termination ➡ stud ( https://github.com/bumptech/stud ) • GZIP Compression ➡ NGINX | haproxy • Serving Static Assets ➡ CDN | NGINX | Varnish ( https://www.varnish-cache.org/ )

Slide 22

Slide 22 text

Know Your Bottlenecks • Node.JS serves really well as a highly concurrent networking app. • Node.JS is very sensitive to memory leaks and blocking code. • 99% of the time you will be IO-bound.

Slide 23

Slide 23 text

Know the Ecosystem • Do Not Ignore The Ecosystem • Follow Community News and Updates • Attend to Conferences (like this one) • Know Your Tools and Use Them

Slide 24

Slide 24 text

Tweaking the OS

Slide 25

Slide 25 text

Open File Limits "Error: EMFILE, Too many open files" ulimit -n 60000

Slide 26

Slide 26 text

Open File Limits "Error: EMFILE, Too many open files" ulimit -n 60000

Slide 27

Slide 27 text

Open File Limits "Error: EMFILE, Too many open files" ulimit -n 60000

Slide 28

Slide 28 text

Configuring the Load Balancer * See https://bit.ly/nginx-rocks

Slide 29

Slide 29 text

Configuring the Load Balancer * See https://bit.ly/nginx-rocks

Slide 30

Slide 30 text

Configuring the Load Balancer * See https://bit.ly/nginx-rocks

Slide 31

Slide 31 text

Additional Tweaks sysctl -w net.core.somaxconn=1024; (default is 128)

Slide 32

Slide 32 text

Even More Tweaks Do NOT alter anything that you don’t know! * See 
 https://www.frozentux.net/ipsysctl-tutorial/ipsysctl-tutorial.html and http://www.tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/ 
 for more info.

Slide 33

Slide 33 text

Security

Slide 34

Slide 34 text

Common Threats • XSS / CSRF • Input Validation Attack • DoS / ReDoS • Request Size * Securing Node.JS is not different from securing any other web app. See also: https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

Slide 35

Slide 35 text

Security • https://www.owasp.org/index.php/ OWASP_Node_js_Goat_Project • http://nodesecurity.io/ • https://www.owasp.org/index.php/ OWASP_Zed_Attack_Proxy_Project

Slide 36

Slide 36 text

Do Not Run Node.JS As Root useradd -mrU web
 mkdir /opt/web-app
 chown web /opt/web-app
 cd /opt/web-app
 su web
 node app.js
 firewall-cmd --permanent --zone=public --add-port=3000/tcp Also, always run Node.JS behind a reverse proxy!

Slide 37

Slide 37 text

Let’s Get Our Hands Dirty

Slide 38

Slide 38 text

restify http tcp containers/000-simple-app-restify containers/001-simple-app-http containers/002-simple-app-tcp

Slide 39

Slide 39 text

restify http tcp ab -n 10000 -c 100 http://app:8000/hello containers/000-simple-app-restify containers/001-simple-app-http containers/002-simple-app-tcp

Slide 40

Slide 40 text

ab -n 10000 -c 100 http://app:8000/hello Tested on MacBook Pro, 2.4 GHz Intel Core i5, 16 GB 1600 MHz DDR3 Going Bare Bones

Slide 41

Slide 41 text

Is It Worth It? • You Can Go Bare-Bones for Maximum Throughput • Tradeoff: • Harder to maintain • More complex code • Error prone • Lots of edge cases • Harder to use additional tooling

Slide 42

Slide 42 text

Concurrency

Slide 43

Slide 43 text

Throughput vs Concurrency

Slide 44

Slide 44 text

Throughput vs Concurrency

Slide 45

Slide 45 text

Throughput vs Concurrency linear increase

Slide 46

Slide 46 text

Throughput vs Concurrency linear increase slowdown

Slide 47

Slide 47 text

Throughput vs Concurrency linear increase slowdown almost constant

Slide 48

Slide 48 text

Throughput vs Concurrency linear increase slowdown almost constant rapid decline

Slide 49

Slide 49 text

Throughput vs Concurrency linear increase slowdown almost constant rapid decline

Slide 50

Slide 50 text

Throughput vs Concurrency

Slide 51

Slide 51 text

Throughput vs Concurrency

Slide 52

Slide 52 text

Throughput vs Concurrency

Slide 53

Slide 53 text

Distributed Load Testing Toolbox • Apps • jMeter: http://jmeter.apache.org • Gatling: http://gatling.io/#/ • The Grinder: 
 http://grinder.sourceforge.net • Locust: 
 https://github.com/locustio/locust • “as a service” • flood.io: https://flood.io • loader.io: http://loader.io • LoadImpact: https://loadimpact.com • BlazeMeter: https://www.blazemeter.com • LoadStorm: http://loadstorm.com

Slide 54

Slide 54 text

Latency

Slide 55

Slide 55 text

Latency

Slide 56

Slide 56 text

Latency and Throughput

Slide 57

Slide 57 text

Latency and Throughput

Slide 58

Slide 58 text

Lessons Learned

Slide 59

Slide 59 text

Lessons Learned • Latency Kills • Know Your Platform & Know Your Tools • For maximum throughput go bare bones • Tradeoff: Giving up all the benefits a framework has to offer • Low-level code is harder to maintain: • Harder to Test and Verify / Easier to Create Bugs and Regressions • Corollary: As you add additional layers of abstractions, your API will marginally slow down. • The Inception Rule: More than three levels and you’re lost forever!

Slide 60

Slide 60 text

Perf Before Scale

Slide 61

Slide 61 text

Perf Advice Is Addictive Optimization without measurement is futile.

Slide 62

Slide 62 text

Perf Before Scale • Rule #1: 
 Avoid premature optimization.
 Do measurements, and optimize what matters. • Tweak Your System for High Performance • Cache All The Things • Cache at every level. • The fastest API response is no response at all. • Delegate Long-Running/CPU-Intensive(*) Operations • Be Lazy Whenever Possible

Slide 63

Slide 63 text

You can’t optim ize w hat you don’t m easure. Optimization

Slide 64

Slide 64 text

Things to Watch Out For • Always Keep an Eye on the Event Loop • Your API Service may Become CPU-Bound • External API Calls Can Be a Bottleneck • Track Heap Usage Over Time • Implement Sanity Checks • Implement Circuit Breakers • Have an Upper Bound for Concurrency

Slide 65

Slide 65 text

Things to Watch Out For • Is the app running and functional? • Is the app overloaded? • How many errors have been raised so far? • Is the app performant (throughput, memory utilization, concurrency)? • Is my cluster healthy? • How many times do forks restored? • Are all clustered forks alive and okay?

Slide 66

Slide 66 text

Which Will (most of the time) Boil Down to… • Watching Response Times • Watching CPU Utilization + General Sys Resource Usage • Watching Number of Concurrent Connections

Slide 67

Slide 67 text

v8 Optimizations

Slide 68

Slide 68 text

Types of Compilers in v8 • Generic Compiler • Optimizing Compiler (Crankshaft) • Can Be Two or More Orders of Magnitude Faster See also: * https://wingolog.org/archives/2011/07/05/v8-a-tale-of-two-compilers * http://thibaultlaurens.github.io/javascript/2013/04/29/how-the-v8-engine-works/ * http://www.html5rocks.com/en/tutorials/speed/v8/

Slide 69

Slide 69 text

X-Ray View Into the v8 Compiler node --trace_opt 
 --trace_deopt 
 --allow-natives-syntax test.js; • console.log(%HasFastProperties(obj)) • console.log(%GetOptimizationStatus(fn)) https://github.com/Nathanaela/v8-natives

Slide 70

Slide 70 text

X-Ray View Into the v8 Compiler node --trace_opt 
 --trace_deopt 
 --allow-natives-syntax test.js; • console.log(%HasFastProperties(obj)) • console.log(%GetOptimizationStatus(fn)) https://github.com/Nathanaela/v8-natives

Slide 71

Slide 71 text

Optimize Hot Code Paths Only (Unless You Have a Solid Evidence to Do Otherwise)

Slide 72

Slide 72 text

v8 Optimization Killers • Using debugger anywhere within the function. • Using eval anywhere within the function. • Using with anywhere within the function. • Using try/catch anywhere within the function. * ~via https://github.com/petkaantonov/bluebird/wiki/Optimization-killers

Slide 73

Slide 73 text

Typical Example: try/catch Inside Function

Slide 74

Slide 74 text

Isolate try/catch

Slide 75

Slide 75 text

Isolate try/catch

Slide 76

Slide 76 text

Isolate try/catch

Slide 77

Slide 77 text

Isolate try/catch

Slide 78

Slide 78 text

Perform Lazy Evaluations

Slide 79

Slide 79 text

Perform Lazy Evaluations

Slide 80

Slide 80 text

Perform Lazy Evaluations <=Sync

Slide 81

Slide 81 text

Perform Lazy Evaluations <=Sync

Slide 82

Slide 82 text

Perform Lazy Evaluations <=Async <=Sync

Slide 83

Slide 83 text

Perform Lazy Evaluations <=Async <=Sync

Slide 84

Slide 84 text

Perform Lazy Evaluations <=Async <=Async <=Sync

Slide 85

Slide 85 text

Perform Lazy Evaluations <=Async <=Async <=Sync

Slide 86

Slide 86 text

Let’s Create Something Real

Slide 87

Slide 87 text

Let’s Create Something Real • An API that… • Auto-suggest tags, given a url • Lists related URLs, given a tag

Slide 88

Slide 88 text

containers/003-the-real-deal

Slide 89

Slide 89 text

containers/003-the-real-deal

Slide 90

Slide 90 text

containers/003-the-real-deal

Slide 91

Slide 91 text

containers/003-the-real-deal

Slide 92

Slide 92 text

API Service Internet Bastion Simulated by an NGINX static web server * Fetch HTML off of websites * Simplify and convert the HTML to plain text * Do NLP/Tokenization on the plain text * Create tags as a result test API Initial Topology

Slide 93

Slide 93 text

Let’s Test How Our API Performs

Slide 94

Slide 94 text

Findings • get-tags appear to be CPU-bound. • When get-tags is being requested, the performance of get-urls becomes two orders of magnitude slower. • get-urls appears to be pretty fast, and it is not CPU bound.

Slide 95

Slide 95 text

How Can We Be Sure? • Add probes (DTrace, XTrace… etc) 
 to trace what’s happening. • Create a REPL to check the app at runtime. containers/004-demo-w-instrumentation

Slide 96

Slide 96 text

Creating a REPL • You Can Expose Internal State via an API and/or a CLI/REPL • vantage: https://github.com/dthree/vantage • kang: https://github.com/davepacheco/kang • repl server: https://nodejs.org/api/repl.html • Expose Additional Logging Info at Runtime (in systems that support it) • bunyan -p ( https://github.com/trentm/node-bunyan )

Slide 97

Slide 97 text

The REPL

Slide 98

Slide 98 text

The REPL

Slide 99

Slide 99 text

The REPL

Slide 100

Slide 100 text

Adding Probes

Slide 101

Slide 101 text

Adding Probes * See https://github.com/v0lkan/kiraz App Node

Slide 102

Slide 102 text

Adding Probes * See https://github.com/v0lkan/kiraz App Node

Slide 103

Slide 103 text

Adding Probes * See https://github.com/v0lkan/kiraz App Node

Slide 104

Slide 104 text

Adding Probes Bastion Host

Slide 105

Slide 105 text

Adding Probes Bastion Host

Slide 106

Slide 106 text

Adding Probes Bastion Host

Slide 107

Slide 107 text

Findings (get-tags)

Slide 108

Slide 108 text

Findings (get-tags)

Slide 109

Slide 109 text

Findings (get-tags)

Slide 110

Slide 110 text

Monitoring Toolbox • Runtime Performance Probing (Kernel-Level Tools) • Linux Perf Events ( https://perf.wiki.kernel.org/index.php/Main_Page ) perf record -F 71 -p `pgrep -n node` -g -- sleep 30 node --perf_basic_prof_only_functions • Dtrace ( http://dtrace.org/blogs/about/ ) • Tracking Transactions and Tracing Latency • Zipkin ( https://github.com/openzipkin/zipkin ) • Runtime Memory Usage (heap stats, heap diffing, leak detection) • Memwatch ( https://github.com/lloyd/node-memwatch ) • See http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection for details.

Slide 111

Slide 111 text

Monitoring Toolbox • Runtime Performance Probing (Kernel-Level Tools) • Linux Perf Events ( https://perf.wiki.kernel.org/index.php/Main_Page ) perf record -F 71 -p `pgrep -n node` -g -- sleep 30 node --perf_basic_prof_only_functions • Dtrace ( http://dtrace.org/blogs/about/ ) • Tracking Transactions and Tracing Latency • Zipkin ( https://github.com/openzipkin/zipkin ) • Runtime Memory Usage (heap stats, heap diffing, leak detection) • Memwatch ( https://github.com/lloyd/node-memwatch ) • See http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection for details.

Slide 112

Slide 112 text

Monitoring Toolbox • Runtime Performance Probing (Kernel-Level Tools) • Linux Perf Events ( https://perf.wiki.kernel.org/index.php/Main_Page ) perf record -F 71 -p `pgrep -n node` -g -- sleep 30 node --perf_basic_prof_only_functions • Dtrace ( http://dtrace.org/blogs/about/ ) • Tracking Transactions and Tracing Latency • Zipkin ( https://github.com/openzipkin/zipkin ) • Runtime Memory Usage (heap stats, heap diffing, leak detection) • Memwatch ( https://github.com/lloyd/node-memwatch ) • See http://jayconrod.com/posts/55/a-tour-of-v8-garbage-collection for details.

Slide 113

Slide 113 text

Monitoring Toolbox • Monitoring “as a service” • nodetime https://nodetime.com/ • newrelic http://newrelic.com/nodejs • strongloop https://strongloop.com/node-js/performance-monitoring/ • keymetrics https://keymetrics.io/ • appdynamics https://www.appdynamics.com/nodejs/ • …

Slide 114

Slide 114 text

So… Something Is CPU-Intensive • get-urls is CPU-bound and it also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first.

Slide 115

Slide 115 text

So… Something Is CPU-Intensive • get-urls is CPU-bound and it also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process

Slide 116

Slide 116 text

So… Something Is CPU-Intensive • get-urls is CPU-bound and it also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process

Slide 117

Slide 117 text

So… Something Is CPU-Intensive • get-urls is CPU-bound and it also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process

Slide 118

Slide 118 text

So… Something Is CPU-Intensive • get-urls is CPU-bound and it also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process

Slide 119

Slide 119 text

So… Something Is CPU-Intensive • get-urls is CPU-bound and it also blocks the event loop • What can we do? • Split computationally heavy parts and 
 fork as child processes and use external libraries. • Create a native Node.JS extension 
 that does not block the event loop. • Refactor the compute logic into a separate service first. app memory worker worker worker child_process

Slide 120

Slide 120 text

Split App and Compute Nodes Compute Service API Service Message Bus rabbitmq, zeromq, resque etc. see also http://queues.io/ * * containers/005-demo-split-compute

Slide 121

Slide 121 text

Message Bus Topologies P C Send/Listen P C1 C2 Worker Queue X C1 PubSub P C2

Slide 122

Slide 122 text

Message Bus Topologies P C Send/Listen P C1 C2 Worker Queue X C1 PubSub P C2

Slide 123

Slide 123 text

Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue

Slide 124

Slide 124 text

Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue

Slide 125

Slide 125 text

Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue

Slide 126

Slide 126 text

Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue

Slide 127

Slide 127 text

Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue

Slide 128

Slide 128 text

Split App and Compute Nodes (Message Bus) RabbitMQ Request Queue Response Queue API Service Compute Service API Service API Service … … Compute Service one response queue per service round-robin dispatch Compute Service Response Queue Response Queue

Slide 129

Slide 129 text

Log Aggregation “G ood developers debug. G reat developers read logs.”

Slide 130

Slide 130 text

Aggregate and Rotate Your Log Files Log Aggregator Compute Service memory API Service memory Message Bus containers/006-demo-eventbus-logaggr

Slide 131

Slide 131 text

Aggregate and Rotate Your Log Files Log Aggregator Compute Service memory API Service memory Message Bus containers/006-demo-eventbus-logaggr

Slide 132

Slide 132 text

Aggregate and Rotate Your Log Files Log Aggregator Compute Service memory API Service memory Message Bus containers/006-demo-eventbus-logaggr

Slide 133

Slide 133 text

Use a Decent Logger • Bunyan ( https://github.com/trentm/node-bunyan ) • Winston ( https://github.com/winstonjs/winston ) • Log4JS ( https://github.com/nomiddlename/log4js-node )

Slide 134

Slide 134 text

What to Log • Authentication & Authorization • Session Management • Method Entry Points • Errors and Weird Events • Specific Events (startup, shutdown, slowdown etc.) • High-Risk Functionalities (payments, privileges, admins etc)

Slide 135

Slide 135 text

Log Analysis Toolbox • Loggly ( https://www.loggly.com/ ) • ELK Stack ( https://www.elastic.co/products ) • Nagios Log Server ( https://www.nagios.com/products/nagios-log-server/ ) • Splunk ( http://www.splunk.com/en_us/homepage.html ) • …

Slide 136

Slide 136 text

Utilize Caching Compute Service API Service in-memory cache Message Bus containers/006-demo-eventbus-logaggr

Slide 137

Slide 137 text

Utilize Caching Compute Service API Service in-memory cache Message Bus containers/006-demo-eventbus-logaggr

Slide 138

Slide 138 text

Utilize Caching

Slide 139

Slide 139 text

Utilize Caching

Slide 140

Slide 140 text

Utilize Caching

Slide 141

Slide 141 text

What If My App Crashes?

Slide 142

Slide 142 text

Processes Die Accept it

Slide 143

Slide 143 text

Processes Die Accept it No system is %100 resilient. Every crash is important. Every Exception is Important Too: Adopt a “Zero Exception Policy”

Slide 144

Slide 144 text

containers/007-demo-nodejs-as-a-service Processes Die Accept it

Slide 145

Slide 145 text

Keep It Running •forever ( https://github.com/foreverjs/forever ) •pm2 ( https://github.com/Unitech/pm2 ) •upstart ( http://upstart.ubuntu.com/ ) •systemd ( https://www.wikiwand.com/en/Systemd )

Slide 146

Slide 146 text

Processes Die Accept it

Slide 147

Slide 147 text

Processes Die Accept it

Slide 148

Slide 148 text

Processes Die Accept it

Slide 149

Slide 149 text

Processes Die Accept it https://www.joyent.com/blog/mdb-and-node-js

Slide 150

Slide 150 text

Debugging

Slide 151

Slide 151 text

Live Debugging Given the Tornado, Where’s the Butterfly?

Slide 152

Slide 152 text

Post-Mortem Debugging

Slide 153

Slide 153 text

Node.JS Debugging Myths • Debugging and Profiling in Node.JS is Hard • Debugging and Profiling in Node.JS is Immature • You Cannot Debug or Profile a Live Production Node.JS App

Slide 154

Slide 154 text

Debugging • Live Debugging (using a REPL) • Remote Debugging 
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)

Slide 155

Slide 155 text

Debugging • Live Debugging (using a REPL) • Remote Debugging 
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)

Slide 156

Slide 156 text

Debugging • Live Debugging (using a REPL) • Remote Debugging 
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)

Slide 157

Slide 157 text

Debugging • Live Debugging (using a REPL) • Remote Debugging 
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)

Slide 158

Slide 158 text

Debugging • Live Debugging (using a REPL) • Remote Debugging 
 (Node Inspector https://github.com/node-inspector/node-inspector, 
 WebStorm https://www.jetbrains.com/webstorm/, 
 Cloud9 IDE https://c9.io/) • Post-Mortem Debugging 
 (MDB: https://github.com/joyent/mdb_v8)

Slide 159

Slide 159 text

Flame Graphs http://www.brendangregg.com/flamegraphs.html http://github.com/brendangregg/FlameGraph

Slide 160

Slide 160 text

Flame Graphs & Core Dumps • Core Dumps • Can Be Created When Node.JS Crashes ( --abort_on_uncaught_exception ) • Can Be Created at Runtime ( using gcore * ) • Flame Graphs • You Can Use dtrace + stackvis to generate them ** • You Can Use perf events + Flame Graphs Tool to generate them *** http://man7.org/linux/man-pages/man1/gcore.1.html * http://blog.nodejs.org/2012/04/25/profiling-node-js/ ** http://yunong.io/2015/11/23/generating-node-js-flame-graphs/ ***

Slide 161

Slide 161 text

Debugging (Profiling) • Use Kernel Level Tools • DTrace (Solaris, BSD), perf (Linux), and XPerf (Windows) • Can be used in production • Use the v8 Profiler • Not quite suitable for production

Slide 162

Slide 162 text

v8 Profiler

Slide 163

Slide 163 text

v8 Profiler • node --v8-options | grep gc — node --v8-options | grep '\-\-trace' • `node --perf_basic_prof_only_functions .` => for perf events (new in Node 5) • `node --expose_gc --trace_gc --trace_gc_object_stats 
 --trace_gc_verbose --gc_global .` => traces to the console • `node --prof --log_timer_events --track_gc_object_stats
 --log_internal-timer_events --no-use-inlining .` => creates a perf log file * See also: http://www.chromium.org/developers/creating-v8-profiling-timeline-plots

Slide 164

Slide 164 text

v8 Profiler

Slide 165

Slide 165 text

v8 Profiler

Slide 166

Slide 166 text

v8 Profiler

Slide 167

Slide 167 text

v8 Profiler

Slide 168

Slide 168 text

Debugging Demo containers/008-demo-watching-for-leaks

Slide 169

Slide 169 text

Help the Debugger • Always Name Your Functions • Don’t let the errors go unhandled. • Emit “error” events instead of throwing exceptions. • Use an error library: • https://github.com/davepacheco/node-verror • Put a descriptive message before raising an error.

Slide 170

Slide 170 text

Help the Debugger • Always Name Your Functions • Don’t let the errors go unhandled. • Emit “error” events instead of throwing exceptions. • Use an error library: • https://github.com/davepacheco/node-verror • Put a descriptive message before raising an error.

Slide 171

Slide 171 text

Use a Private NPM Log Aggregator Compute Service memory API Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia

Slide 172

Slide 172 text

Use a Private NPM Log Aggregator Compute Service memory API Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia

Slide 173

Slide 173 text

Use a Private NPM Log Aggregator Compute Service memory API Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia

Slide 174

Slide 174 text

Use a Private NPM Log Aggregator Compute Service memory API Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia

Slide 175

Slide 175 text

Use a Private NPM Log Aggregator Compute Service memory API Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia

Slide 176

Slide 176 text

Use a Private NPM Log Aggregator Compute Service memory API Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia

Slide 177

Slide 177 text

Use a Private NPM Log Aggregator Compute Service memory API Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia

Slide 178

Slide 178 text

Use a Private NPM Log Aggregator Compute Service memory API Service memory Message Bus Private NPM Public NPM cache / mirror containers/009-demo-setting-up-private-npm (sinopia) https://github.com/rlidwka/sinopia

Slide 179

Slide 179 text

Use a Private NPM

Slide 180

Slide 180 text

Use a Private NPM

Slide 181

Slide 181 text

Use a Private NPM

Slide 182

Slide 182 text

Use a Private NPM

Slide 183

Slide 183 text

Use a Private NPM ../../../../../../wtf ?!

Slide 184

Slide 184 text

Use a Private NPM

Slide 185

Slide 185 text

Use a Private NPM

Slide 186

Slide 186 text

Use a Private NPM

Slide 187

Slide 187 text

Use a Private NPM

Slide 188

Slide 188 text

Use a Private NPM (prefix local modules)

Slide 189

Slide 189 text

Use a Private NPM (prefix local modules)

Slide 190

Slide 190 text

Use a Private NPM • Promotes modularization and code re-use. • Modules are cached, hence faster to install. • You can continue your work, even when public registry goes offline. • Makes refactoring and testing easier. • No more “../../../..”s!

Slide 191

Slide 191 text

Clustering :(){
 :|:&
 };:

Slide 192

Slide 192 text

Clustering containers/010-cluster * See http://docs.libuv.org/en/v1.x/threadpool.html https://nikhilm.github.io/uvbook/processes.html https://nikhilm.github.io/uvbook/threads.html for how the dark magic works internally. * See also https://strongloop.com/strongblog/whats-new-in-node-js-v0-12-cluster-round-robin-load-balancing/ for how the load balancing between processes in the cluster module evolved over time;
 and see https://github.com/nodejs/node-v0.x-archive/commit/e72cd41 
 for the Round-Robin cluster load balancing algorithm.

Slide 193

Slide 193 text

Clustering app memory app app app * See https://strongloop.com/strongblog/whats-new-in-node-js-v0-12-cluster-round-robin-load-balancing/ how the load balancing between processes in the cluster module evolved over time;
 and see https://github.com/nodejs/node-v0.x-archive/commit/e72cd41 
 for the Round-Robin cluster load balancing algorithm.

Slide 194

Slide 194 text

Is Bigger Always Better? ultra mega super box with bazillion cores regular box

Slide 195

Slide 195 text

How Many Workers Per VM? two to four cores per VM is an ideal balance m a s t e r child_process child_process child_process child_process

Slide 196

Slide 196 text

Is Bigger Always Better? OR you can use lightweight single-CPU containers and a LB in lieu of clustering lightweight container lightweight container Load Balancer lightweight container lightweight container <-single core <-single core <-single core <-single core

Slide 197

Slide 197 text

Cluster The Services VM 2 Compute Service Compute Service cluster API Service API Service cluster VM 1 Message Bus

Slide 198

Slide 198 text

Cluster

Slide 199

Slide 199 text

Cluster

Slide 200

Slide 200 text

Cluster

Slide 201

Slide 201 text

Cluster + Zero Downtime Rolling Deployments

Slide 202

Slide 202 text

Zero Downtime Rolling Deployments kill --USR2

Slide 203

Slide 203 text

Zero Downtime Rolling Deployments

Slide 204

Slide 204 text

Zero Downtime Rolling Deployments

Slide 205

Slide 205 text

Zero Downtime Rolling Deployments

Slide 206

Slide 206 text

Zero Downtime Rolling Deployments

Slide 207

Slide 207 text

Zero Downtime Rolling Deployments

Slide 208

Slide 208 text

Zero Downtime Rolling Deployments

Slide 209

Slide 209 text

Circuit Breaker closed fail (under threshold) open fail (reached threshold) checking… timer (exponential backoff) fail success See http://www.amazon.com/gp/product/0978739213 and http://martinfowler.com/bliki/CircuitBreaker.html (503: Server Busy) (200: OK)

Slide 210

Slide 210 text

Circuit Breaker * This is a simplified example, and it does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.

Slide 211

Slide 211 text

Circuit Breaker * This is a simplified example, and it does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.

Slide 212

Slide 212 text

Circuit Breaker * This is a simplified example, and it does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.

Slide 213

Slide 213 text

Circuit Breaker * This is a simplified example, and it does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.

Slide 214

Slide 214 text

Circuit Breaker * This is a simplified example, and it does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.

Slide 215

Slide 215 text

Circuit Breaker * This is a simplified example, and it does not strictly follow circuit-breaker state transitions. See:https://github.com/yammer/circuit-breaker-js and https://github.com/mweagle/circuit-breaker for more canonical implementations. local-modules/local-fluent-circut * You can use https://github.com/lloyd/node-toobusy for checking event loop delay.

Slide 216

Slide 216 text

Circuit Breaker • Can be used with any kind of metric. • You can use to “rate limit” your API. • Useful when you depend on other APIs that might fail.

Slide 217

Slide 217 text

Where Were We? VM 2 Compute Service Compute Service cluster API Service API Service cluster VM 1 Message Bus

Slide 218

Slide 218 text

VM 2 Compute Service Compute Service cluster API Service API Service cluster VM 1 Message Bus memory memory Are We Missing Something?

Slide 219

Slide 219 text

VM 2 Compute Service Compute Service cluster API Service API Service cluster VM 1 Message Bus memory memory Are We Missing Something?

Slide 220

Slide 220 text

VM 2 Compute Service Compute Service cluster API Service API Service cluster VM 1 Message Bus memory memory Are We Missing Something?

Slide 221

Slide 221 text

Move the State Information Out VM 2 Compute Service Compute Service cluster redis API Service API Service redis cluster VM 1 Message Bus containers/011-sharing-memory • Use redis to solve session affinity. • Use token-based authentication with JWT to handle authentication 
 ( https://scotch.io/tutorials/the-ins-and-outs-of-token-based-authentication ).

Slide 222

Slide 222 text

Move the State Information Out VM 2 Compute Service Compute Service cluster redis API Service API Service redis cluster VM 1 Message Bus containers/011-sharing-memory • Use redis to solve session affinity. • Use token-based authentication with JWT to handle authentication 
 ( https://scotch.io/tutorials/the-ins-and-outs-of-token-based-authentication ).

Slide 223

Slide 223 text

Move the State Information Out VM 2 Compute Service Compute Service cluster redis API Service API Service redis cluster VM 1 Message Bus containers/011-sharing-memory • Use redis to solve session affinity. • Use token-based authentication with JWT to handle authentication 
 ( https://scotch.io/tutorials/the-ins-and-outs-of-token-based-authentication ).

Slide 224

Slide 224 text

Add a Load-Balancer Compute Service Compute Service cluster redis API Service API Service redis cluster Compute Service Compute Service cluster API Service API Service cluster Load Balancer Message Bus containers/012-bounce

Slide 225

Slide 225 text

Add a Load-Balancer Compute Service Compute Service cluster redis API Service API Service redis cluster Compute Service Compute Service cluster API Service API Service cluster Load Balancer Message Bus containers/012-bounce

Slide 226

Slide 226 text

Add AutoScale Rules autoscale groups Compute Service Compute Service cluster redis API Service API Service redis cluster Compute Service Compute Service cluster API Service API Service cluster Load Balancer Message Bus

Slide 227

Slide 227 text

Load Balancing Options • Load Balancing as a Service (AWS, Rackspace…) • Hardware Load Balancer (Cisco CEF, Barracuda, etc…) • Software Load Balancer • NGINX • HAProxy • home grown

Slide 228

Slide 228 text

Load Balancer

Slide 229

Slide 229 text

Load Balancer

Slide 230

Slide 230 text

Load Balancer

Slide 231

Slide 231 text

Load Balancer

Slide 232

Slide 232 text

Load Balancer

Slide 233

Slide 233 text

Wait! Aren’t These Actually Microservices? API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process broker load balancer Internet message bus redis redis … …

Slide 234

Slide 234 text

API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process broker load balancer Internet message bus redis redis … … API μ-Service Wait! Aren’t These Actually Microservices?

Slide 235

Slide 235 text

API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process API app compute app worker worker worker child_process API app c l u s t e r c l u s t e r compute app worker worker worker child_process broker load balancer Internet message bus redis redis … … API μ-Service Compute μ-Service Wait! Aren’t These Actually Microservices? * See Also: http://martinfowler.com/articles/microservice-trade-offs.html http://highscalability.com/blog/2014/4/8/microservices-not-a-free-lunch.html https://rclayton.silvrback.com/failing-at-microservices

Slide 236

Slide 236 text

That Means You’ve Become Famous Scalability Will Be the Least of Your Concerns What If I Reach The Scalability Limits Within a Region?

Slide 237

Slide 237 text

Multiple Regions Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin

Slide 238

Slide 238 text

Multiple Regions Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin

Slide 239

Slide 239 text

Multiple Regions Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin

Slide 240

Slide 240 text

Multiple Regions Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin

Slide 241

Slide 241 text

Multiple Regions Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Round-Robin DNS The Internet Message Bus Message Bus containers/013-round-robin

Slide 242

Slide 242 text

Round-Robin DNS containers/013-round-robin

Slide 243

Slide 243 text

Round-Robin DNS containers/013-round-robin

Slide 244

Slide 244 text

Multiple Regions Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 245

Slide 245 text

Multiple Regions Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 246

Slide 246 text

Multiple Regions Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 247

Slide 247 text

You Can Add More Region 1 Compute AutoScale Group API AutoScale Group LB LB DNS The Internet Message Bus Region 2 Compute AutoScale Group API AutoScale Group Message Bus … Region N Compute AutoScale Group API AutoScale Group Message Bus LB

Slide 248

Slide 248 text

How Do I Manage All This Infrastructure? This is Getting Out of Hand! Region 1 Compute AutoScale Group API AutoScale Group LB LB DNS The Internet Message Bus Region 2 Compute AutoScale Group API AutoScale Group Message Bus … Region N Compute AutoScale Group API AutoScale Group Message Bus LB

Slide 249

Slide 249 text

Configuration Management

Slide 250

Slide 250 text

• No Hard-Coded IP Addresses in Config Files • Let DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips

Slide 251

Slide 251 text

• No Hard-Coded IP Addresses in Config Files • Let DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips

Slide 252

Slide 252 text

• No Hard-Coded IP Addresses in Config Files • Let DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips

Slide 253

Slide 253 text

• No Hard-Coded IP Addresses in Config Files • Let DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips

Slide 254

Slide 254 text

• No Hard-Coded IP Addresses in Config Files • Let DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips

Slide 255

Slide 255 text

• No Hard-Coded IP Addresses in Config Files • Let DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips

Slide 256

Slide 256 text

• No Hard-Coded IP Addresses in Config Files • Let DNS do What it Does Best • Use Environment Variables for Infrastructure Management • Converge Your Infrastructure Using a Central Service • Salt Cloud ( https://docs.saltstack.com/en/develop/topics/cloud/index.html ) • AWS CloudFormation ( https://aws.amazon.com/cloudformation/ ) • Chef ( https://www.chef.io/ ) • Puppet ( https://puppetlabs.com/ ) • Ansible ( http://www.ansible.com/ ) • Service Discovery • Consul ( https://www.consul.io/ ) Configuration Management Tips

Slide 257

Slide 257 text

Test Your System as a Whole

Slide 258

Slide 258 text

Test Your System as a Whole

Slide 259

Slide 259 text

CI / CD • Use a CI / CD Pipeline • Show Love to Test-Driven Development • Don’t Forget Functional Tests and Integration Tests

Slide 260

Slide 260 text

Continuously Keep Your Code In Ship Shape • ESLint ( http://eslint.org ) • CodeClimate ( https://codeclimate.com/features ) • GreenKeeper ( http://greenkeeper.io ) • npm scripts (instead of Grunt or Gulp — YMMV)
 ( https://docs.npmjs.com/misc/scripts ) • npm outdated ( https://docs.npmjs.com/cli/outdated ) • git pre-commit hooks ( https://github.com/observing/pre-commit ) • [ hint: Install your development dependencies (such as eslint, babel, gulp, etc) locally, (not globally)! ]

Slide 261

Slide 261 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 262

Slide 262 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 263

Slide 263 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 264

Slide 264 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 265

Slide 265 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 266

Slide 266 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 267

Slide 267 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 268

Slide 268 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 269

Slide 269 text

Are We Done Yet? Load Balancer Load Balancer Region 1 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB Region 2 Compute Service Compute Service redis API Service API Service redis Compute Service Compute Service API Service API Service LB replication replication Round-Robin DNS The Internet Message Bus Message Bus

Slide 270

Slide 270 text

Making the Load Balancer HA * see also: https://www.wikiwand.com/en/Virtual_Router_Redundancy_Protocol Load Balancer client Load Balancer Load Balancer client keepalived active failover

Slide 271

Slide 271 text

Making the Load Balancer HA * see also: https://www.wikiwand.com/en/Virtual_Router_Redundancy_Protocol Load Balancer client Load Balancer Load Balancer client keepalived active failover

Slide 272

Slide 272 text

Making the Load Balancer Highly Available • round-robin DNS • https://www.wikiwand.com/en/Round-robin_DNS • heartbeat • https://www.wikiwand.com/en/Heartbeat_(computing) • keepalived • http://keepalived.org/ * You can use these tools to make any component HA.

Slide 273

Slide 273 text

SSL Termination * * * https://github.com/bumptech/stud Load Balancer Load Balancer client keepalived active failover SSL Terminator SSL Terminator client keepalived active failover Load Balancer Load Balancer

Slide 274

Slide 274 text

SSL Termination * * * https://github.com/bumptech/stud Load Balancer Load Balancer client keepalived active failover SSL Terminator SSL Terminator client keepalived active failover Load Balancer Load Balancer

Slide 275

Slide 275 text

SSL Termination * * * https://github.com/bumptech/stud Load Balancer Load Balancer client keepalived active failover SSL Terminator SSL Terminator client keepalived active failover Load Balancer Load Balancer

Slide 276

Slide 276 text

Make Redis and RabbitMQ Redundant redis redis (master) redis (read replica) redis (read replica) redis (read replica) redis (master) redis (read replica) redis (read replica) redis (read replica) round-robin DNS This will also increase throughput as a side benefit. See http://redis.io/topics/replication and http://redis.io/topics/ cluster-tutorial. You can also use a managed “memory as a service” solution. See also https://www.rabbitmq.com/ha.html for how a similar queue mirroring is implemented for a RabbitMQ cluster. 
 And similarly, you can use a managed “queue as a service” solution to ease your pain ;)

Slide 277

Slide 277 text

Build Redundancy Everywhere

Slide 278

Slide 278 text

Build Redundancy Everywhere Note This is more typically done by using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.

Slide 279

Slide 279 text

Build Redundancy Everywhere Note This is more typically done by using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.

Slide 280

Slide 280 text

Build Redundancy Everywhere Note This is more typically done by using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.

Slide 281

Slide 281 text

Build Redundancy Everywhere Note This is more typically done by using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.

Slide 282

Slide 282 text

Build Redundancy Everywhere Note This is more typically done by using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.

Slide 283

Slide 283 text

Build Redundancy Everywhere Note This is more typically done by using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.

Slide 284

Slide 284 text

Build Redundancy Everywhere Note This is more typically done by using a sidekick health checks of your service discovery tool. See https://www.consul.io/intro/getting-started/checks.html for details for example.

Slide 285

Slide 285 text

Torture Your System • Try Chaos Monkey • https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey • Randomly send `kill -9` to Processes • Randomly Knock a Server Offline • Intentionally Run Out of Disk Space • Take an entire data center down

Slide 286

Slide 286 text

Summary Σ

Slide 287

Slide 287 text

Summary Stateless is Better than Stateful Eventual Consistency Build Redundancy Everywhere! Startup Fast, Shut Down Gracefully Solve Problems That Actually Exist

Slide 288

Slide 288 text

Summary Never Assume, Always Measure Perf Before Scale Infrastructure is Code; Automate It! Keep Configuration Details in Environment Variables Show Love to DNS

Slide 289

Slide 289 text

Summary • Know Your Ecosystem • Know Your Tools • Use Tools, not Rules!

Slide 290

Slide 290 text

Scale 2 ∞ & 㱺 Region 1 Compute AutoScale Group API AutoScale Group LB LB DNS The Internet Message Bus Region 2 Compute AutoScale Group API AutoScale Group Message Bus … Region N Compute AutoScale Group API AutoScale Group Message Bus LB API Service Internet Bastion Simulated by an NGINX static web server * Fetch HTML off of websites * Simplify and convert the HTML to plain text * Do NLP/Tokenization on the plain text * Create tags as a result test API

Slide 291

Slide 291 text

Thank You Questions?