Slide 1

Slide 1 text

Node.js Production Checklist Gergely Nemeth @nthgergo | nemethgergely.com

Slide 2

Slide 2 text

Hi, I am Gergely. ■ Works at GoDaddy ■ Previously RisingStack ■ To chat with me: – @nthgergo – [email protected] ■ Read my stuff: – nemethgergely.com

Slide 3

Slide 3 text

Node.js Production Checklist Gergely Nemeth @nthgergo | nemethgergely.com

Slide 4

Slide 4 text

Why do you need a production checklist?

Slide 5

Slide 5 text

Why do you need a production checklist? ■ Providing a “three nines” (99,9%) service, this is your error budget: – 43 minutes of downtime in a month – 2.2 hours of downtime in a quarter – 8.8 hours of downtime in a year

Slide 6

Slide 6 text

Why do you need a production checklist? ■ Providing a “five nines” (99,999%) service – 26 seconds of downtime in a month – 78 seconds of downtime in a quarter – 5 minutes of downtime in a year

Slide 7

Slide 7 text

Why do you need a production checklist? ■ These are called SLAs (service level agreement) ■ SLAs consist of an SLO (service level objectives) and penalties if you don’t meet them, like the 99% of the requests finish under 200 ms

Slide 8

Slide 8 text

Why do you need a production checklist? If you are operating a service, you should define SLOs for your customers – no matter if they are internal or external

Slide 9

Slide 9 text

Today, you will learn about: ■ Error handling in Node.js ■ How to secure your Node.js applications ■ Building a CI/CD pipeline for Node.js ■ How to get full visibility into production systems – Best practices for logging – Monitoring your Node.js applications ■ What to do after hell broke loose

Slide 10

Slide 10 text

Error handling

Slide 11

Slide 11 text

Error handling

Slide 12

Slide 12 text

Error handling Async-await changes how we write Node.js applications

Slide 13

Slide 13 text

Error handling You have to learn to throw again

Slide 14

Slide 14 text

Error handling - Express app.get('/users/:id', (req, res) => { const userId = req.params.id if (!userId) { return res.sendStatus(400).json({ error: 'Missing id' }) } Users.get(userId, (err, user) => { if (err) { return res.sendStatus(500).json(err) } res.send(users) }) })

Slide 15

Slide 15 text

Error handling - Express - Errors are handled differently across the codebase - > Use Express error handlers instead - Migrate from callbacks to async-await

Slide 16

Slide 16 text

Error handling - Express app.get('/users/:id', (req, res, next) => { const userId = req.params.id if (!userId) { const missingIdError = new Error('Missing id') missingIdError.httpStatusCode = 400 return next(missingIdError) } Users.get(userId, (err, user) => { if (err) { err.httpStatusCode = 500 return next(err) } res.send(Users) }) })

Slide 17

Slide 17 text

Error handling - Express app.use((err, req, res, next) => { // log the error... res.sendStatus(err.httpStatusCode).json(err) }) Add an error handler middleware as the last one

Slide 18

Slide 18 text

Error handling - Express const missingIdError = new Error('Missing id') missingIdError.httpStatusCode = 400 next(missingIdError) $

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Error handling - Express const boom = require('boom') app.get('/users/:id', (req, res, next) => { const userId = req.params.id if (!userId) { return next(boom.badRequest('missing id')) } Users.get(userId, (err, user) => { if (err) { return next(boom.badImplementation(err)) } res.send(Users) }) })

Slide 21

Slide 21 text

Error handling - Express const asyncMiddleware = fn => (req, res, next) => { Promise.resolve(fn(req, res, next)).catch(next); }; module.exports = exports = asyncMiddleware; Wrapping the Express route handlers for proper error propagation

Slide 22

Slide 22 text

Error handling – Express The modified error handler app.use((err, req, res, next) => { if (err.isServer) { // log the error... // probably you don't want to log unauthorized access // or do you? } return res.status(err.output.statusCode).json(err.output.payload); })

Slide 23

Slide 23 text

Error handling - Express app.get('/users/:id', asyncMw((req, res) => { const userId = req.params.id if (!userId) { throw boom.badRequest('missing id') } const users = await getUserAsync(userId) res.json(users) }))

Slide 24

Slide 24 text

Error handling - Express app.get('/users/:id', asyncMw((req, res) => { const userId = req.params.id if (!userId) { throw boom.badRequest('missing id') } const users = await getUserAsync(userId) res.json(users) })) app.get('/users/:id', (req, res) => { const userId = req.params.id if (!userId) { return res.sendStatus(400).json({ error: 'Missing id' }) } Users.get(userId, (err, user) => { if (err) { return res.sendStatus(500).json(err) } res.send(users) }) })

Slide 25

Slide 25 text

Error handling - Express • Have unified error handling • Start using async-await with throw and try-catch

Slide 26

Slide 26 text

Securing your application

Slide 27

Slide 27 text

Securing your application

Slide 28

Slide 28 text

Securing your application Enable two-factor authentication for your npm account

Slide 29

Slide 29 text

Securing your application More than 500,000 modules in the registry

Slide 30

Slide 30 text

Securing your application

Slide 31

Slide 31 text

Securing your application

Slide 32

Slide 32 text

Securing your application

Slide 33

Slide 33 text

Securing your application • Node Security Project: security advisories • Snyk: Vulnerability DB

Slide 34

Slide 34 text

Securing you application Security HTTP header • X-Frame-Options to mitigates clickjacking attacks, • Strict-Transport-Security to keep your users on HTTPS, • X-XSS-Protection to prevent reflected XSS attacks, • X-DNS-Prefetch-Control to disable browsers’ DNS prefetching.

Slide 35

Slide 35 text

Securing you application Security HTTP header const express = require('express') const helmet = require('helmet') const app = express() app.use(helmet())

Slide 36

Slide 36 text

Securing you application Validating user input const Joi = require('joi'); const schema = Joi.object().keys({ username: Joi.string().alphanum().min(3).max(30).required(), access_token: [Joi.string(), Joi.number()], birthyear: Joi.number().integer().min(1900).max(2017), email: Joi.string().email() }).with('username', 'birthyear') // Return result const result = Joi.validate({ username: 'abc', birthyear: 1994 }, schema) // result.error === null -> valid

Slide 37

Slide 37 text

Logging best practices

Slide 38

Slide 38 text

Logging best practices Requirements • Timestamps to know when a given event happened, • Format to keep log lines readable for both humans and machines, • Destination should be the standard output and error only, • Support for log levels

Slide 39

Slide 39 text

Logging best practices Using log levels • Error • General errors, always reported • Used whenever an unexpected error happens which prevents further processing • The app may try to recover (like on database connection lost) or forcefully terminate

Slide 40

Slide 40 text

Logging best practices Using log levels • Warn • For events indicating irregular circumstances, with clearly defined recovery strategy • It has no impact on system availability or performance • These events should be reported too

Slide 41

Slide 41 text

Logging best practices Using log levels • Info • These events indicate major state changes in the application, like the startup of the HTTP server • Each component should log: • When it starts and when it became operational • When it started shutdown, and just before it stopped

Slide 42

Slide 42 text

Logging best practices Using log levels • Debug • Diagnostical level events, for internal state changes • These events are usually not reported, just for troubleshooting • At the discretion of the engineer developing the system component

Slide 43

Slide 43 text

Logging best practices Example logger.info('server started initializing ') const app = express() app.get('/', (req, res) => { res.send('ok!') }) app.listen(PORT, (err) => { if (err) { return logger.error(err) } logger.info(`server started listening on ${PORT}`) })

Slide 44

Slide 44 text

Logging best practices Log the original URL of the request for errors app.use((err, req, res, next) => { if (err.isServer) { logger.error(err.message, { stack: err.stack, originalUrl: req.originalUrl }) } return res.status(err.output.statusCode).json(err.output.payload); })

Slide 45

Slide 45 text

Building a CICD pipeline

Slide 46

Slide 46 text

Building a CICD pipeline • Protect the master branch • Only approved pull requests can be merge • Contributors with write access can approve

Slide 47

Slide 47 text

Building a CICD pipeline • Run automated checks on your codebase: • Linting • Unit, integration, functional tests • Code coverage

Slide 48

Slide 48 text

Building a CICD pipeline • With good test coverage • You can deploy to production on every commit • Reduced risk • Faster reaction times • Flexible release options

Slide 49

Slide 49 text

Monitoring your applications

Slide 50

Slide 50 text

Monitoring your applications Most important metrics to watch • Error rate, as they directly affect customer satisfaction; • Latency, as the slower the service, the most likely your customers close your application; • Throughput, to put error rate and latency in context; • Saturation, to tell if you can handle more traffic.

Slide 51

Slide 51 text

Monitoring your applications How your app is doing?

Slide 52

Slide 52 text

Monitoring your applications Aggregated metrics have little value

Slide 53

Slide 53 text

Monitoring your applications You want to have route-level metrics This way of monitoring is called white box monitoring Tools like Prometheus, New Relic, Opbeat or Dynatrace can help to implement it.

Slide 54

Slide 54 text

Monitoring your applications Can users access it? This way of monitoring is called black box monitoring Tools like Pingdom or ping.apex.sh provides these services.

Slide 55

Slide 55 text

Incident handling

Slide 56

Slide 56 text

Incident handling You build it, you run it

Slide 57

Slide 57 text

Incident handling You build it, you run it • Teams will think about how their software is going to run in production • Encourages ownership and accountability which leads to more independent, responsible teammates • Leads to operational excellence • Which leads to more satisfied customers

Slide 58

Slide 58 text

Incident handling When all hell breaks loose • Have a “War Room” / “On call” channel in your instant messaging system • When the issue happens, notify your customers through a status page, keep it up-to-date through the investigation • Do post-mortems on what led to the issue, and what actions will be taken to prevent it in the future

Slide 59

Slide 59 text

Incident handling The human aspect • At least 3 people for a 24/7 schedule • With distributed teams you can minimize night shifts • Set up a pager with escalation policies • Have a secondary team on standby

Slide 60

Slide 60 text

Incident handling The human aspect • A fair number of is one or two issues per on-call shift • With more, you risk: • Burning-out your team • Decreasing the work quality because of the focus lost • With less, they may lose touch with the production system

Slide 61

Slide 61 text

Summary • How to handler errors is Node.js using async-await • The most important tasks automated by a CI/CD pipeline • Logging and monitoring best practices • Incident handling

Slide 62

Slide 62 text

Thanks! I’ll be around, just say hi if you want to talk

Slide 63

Slide 63 text

Oh, and we are hiring! I’ll be around, just say hi if you want to talk https://careers.godaddy.net/

Slide 64

Slide 64 text

Resources • https://blog.risingstack.com/node-js-logging-tutorial/ • https://nemethgergely.com/nodejs-security-overview/ • https://nodesecurity.io/ • https://wiki.opendaylight.org/view/BestPractices/Logging_ Best_Practices • https://blog.risingstack.com/node-js-security-checklist/