Node.js Production Checklist - SFNode

Node.js Production Checklist - SFNode

29955e7f5ba4bd072e7c0e98e4a788fb?s=128

Gergely Nemeth

February 01, 2018
Tweet

Transcript

  1. Node.js Production Checklist Gergely Nemeth @nthgergo | nemethgergely.com

  2. Hi, I am Gergely. ▪ Works at GoDaddy here ▪

    Previously RisingStack ▪ To chat with me: – @nthgergo – mail@nemethgergely.com ▪ Read my stuff: – nemethgergely.com
  3. Why do you need a production checklist?

  4. Why do you need a production checklist? ▪ Providing a

    “three nines” (99,9%) service, this is your error budget: – 43 minutes of downtime in a month – 2.2 hours of downtime in a quarter – 8.8 hours of downtime in a year
  5. Why do you need a production checklist? ▪ Providing a

    “five nines” (99,999%) service – 26 seconds of downtime in a month – 78 seconds of downtime in a quarter – 5 minutes of downtime in a year
  6. Why do you need a production checklist? ▪ These are

    called SLAs (service level agreement) ▪ SLAs consist of an SLO (service level objectives) and penalties if you don’t meet them, like the 99% of the requests finish under 200 ms
  7. Why do you need a production checklist? If you are

    operating a service, you should define SLOs for your customers – no matter if they are internal or external
  8. Today, you will learn about: ▪ Error handling in Node.js

    ▪ How to secure your Node.js applications ▪ How to get full visibility into production systems – Best practices for logging – Monitoring your Node.js applications ▪ What to do after hell broke loose
  9. Error handling

  10. Error handling Async-await changes how we write Node.js applications

  11. Error handling You have to learn to throw again

  12. Error handling - Express app.get('/users/:id', (req, res) => { const

    userId = req.params.id if (!userId) { return res.sendStatus(400).json({ error: 'Missing id' }) } Users.get(userId, (err, user) => { if (err) { return res.sendStatus(500).json(err) } res.send(users) }) })
  13. Error handling - Express - Errors are handled differently across

    the codebase - > Use Express error handlers instead - Migrate from callbacks to async-await
  14. Error handling - Express app.get('/users/:id', (req, res, next) => {

    const userId = req.params.id if (!userId) { const missingIdError = new Error('Missing id') missingIdError.httpStatusCode = 400 return next(missingIdError) } Users.get(userId, (err, user) => { if (err) { err.httpStatusCode = 500 return next(err) } res.send(Users) }) })
  15. Error handling - Express app.use((err, req, res, next) => {

    // log the error... res.sendStatus(err.httpStatusCode).json(err) }) Add an error handler middleware as the last one
  16. Error handling - Express const missingIdError = new Error('Missing id')

    missingIdError.httpStatusCode = 400 next(missingIdError) $
  17. None
  18. Error handling - Express const boom = require('boom') app.get('/users/:id', (req,

    res, next) => { const userId = req.params.id if (!userId) { return next(boom.badRequest('missing id')) } Users.get(userId, (err, user) => { if (err) { return next(boom.badImplementation(err)) } res.send(Users) }) })
  19. Error handling - Express const asyncMiddleware = fn => (req,

    res, next) => { Promise.resolve(fn(req, res, next)).catch(next) } module.exports = exports = asyncMiddleware Wrapping the Express route handlers for proper error propagation
  20. Error handling - Express app.get('/users/:id', asyncMw((req, res) => { const

    userId = req.params.id if (!userId) { throw boom.badRequest('missing id') } const users = await getUserAsync(userId) res.json(users) }))
  21. Error handling – Express The modified error handler app.use((err, req,

    res, next) => { if (err.isServer) { // log the error... // probably you don't want to log unauthorized access // or do you? } return res.status(err.output.statusCode).json(err.output.payload) })
  22. Error handling - Express app.get('/users/:id', asyncMw((req, res) => { const

    userId = req.params.id if (!userId) { throw boom.badRequest('missing id') } const users = await getUserAsync(userId) res.json(users) })) app.get('/users/:id', (req, res) => { const userId = req.params.id if (!userId) { return res.sendStatus(400).json({ error: 'Missing id' }) } Users.get(userId, (err, user) => { if (err) { return res.sendStatus(500).json(err) } res.send(users) }) })
  23. Securing your application

  24. Securing your application

  25. Securing your application Enable two-factor authentication for your npm account

  26. Securing your application More than 575,000 modules in the registry

  27. Securing your application

  28. Securing your application

  29. Securing your application

  30. Securing your application You are what you require

  31. Securing your application • Node Security Project: security advisories •

    Snyk: Vulnerability DB
  32. Securing you application Security HTTP header • X-Frame-Options to mitigates

    clickjacking attacks, • Strict-Transport-Security to keep your users on HTTPS, • X-XSS-Protection to prevent reflected XSS attacks, • X-DNS-Prefetch-Control to disable browsers’ DNS prefetching.
  33. Securing you application Security HTTP header const express = require('express')

    const helmet = require('helmet') const app = express() app.use(helmet())
  34. Securing you application Validating user input const Joi = require('joi');

    const schema = Joi.object().keys({ username: Joi.string().alphanum().min(3).max(30).required(), access_token: [Joi.string(), Joi.number()], birthyear: Joi.number().integer().min(1900).max(2017), email: Joi.string().email() }).with('username', 'birthyear') // Return result const result = Joi.validate({ username: 'abc', birthyear: 1994 }, schema) // result.error === null -> valid
  35. Logging best practices

  36. Logging best practices Requirements • Timestamps to know when a

    given event happened, • Format to keep log lines readable for both humans and machines, • Destination should be the standard output and error only, • Support for log levels
  37. Logging best practices Using log levels • Error • General

    errors, always reported • Used whenever an unexpected error happens which prevents further processing • The app may try to recover (like on database connection lost) or forcefully terminate
  38. Logging best practices Using log levels • Warn • For

    events indicating irregular circumstances, with clearly defined recovery strategy • It has no impact on system availability or performance • These events should be reported too
  39. Logging best practices Using log levels • Info • These

    events indicate major state changes in the application, like the startup of the HTTP server • Each component should log: • When it starts and when it became operational • When it started shutdown, and just before it stopped
  40. Logging best practices Using log levels • Debug • Diagnostical

    level events, for internal state changes • These events are usually not reported, just for troubleshooting • At the discretion of the engineer developing the system component
  41. Logging best practices Example loggel.info('server started initializing ') const app

    = express() app.get('/', (req, res) => { res.send('ok!') }) app.listen(PORT, (err) => { if (err) { return logger.error(err) } logger.info(`server started listening on ${PORT}`) })
  42. Logging best practices Log the original URL of the request

    for errors app.use((err, req, res, next) => { if (err.isServer) { logger.error(err.message, { stack: err.stack, originalUrl: req.originalUrl }) } return res.status(err.output.statusCode).json(err.output.payload); })
  43. Graceful shutdown • When deploying new versions of your application,

    you will replace old versions • Listen on SIGTERM • Stop accepting new requests • Serve ongoing requests • Clean up the resources your app used
  44. Graceful shutdown Meet Terminus • Provides • Graceful shutdown, •

    Health checks for HTTP applications • Works with any Node.js HTTP servers • Built on stoppable • github.com/godaddy/terminus
  45. Graceful shutdown Meet Terminus

  46. Monitoring your applications

  47. Monitoring your applications Most important metrics to watch • Error

    rate, as they directly affect customer satisfaction; • Latency, as the slower the service, the most likely your customers close your application; • Throughput, to put error rate and latency in context; • Saturation, to tell if you can handle more traffic.
  48. Monitoring your applications How your app is doing?

  49. Monitoring your applications Aggregated metrics have little value

  50. Monitoring your applications You want to have route-level metrics This

    way of monitoring is called white box monitoring Tools like Prometheus, New Relic, Opbeat or Dynatrace can help to implement it.
  51. Monitoring your applications Can users access it? This way of

    monitoring is called black box monitoring Tools like Pingdom or ping.apex.sh provides these services.
  52. Incident handling

  53. Incident handling You build it, you run it

  54. Incident handling You build it, you run it • Teams

    will think about how their software is going to run in production • Encourages ownership and accountability which leads to more independent, responsible teammates • Leads to operational excellence • Which leads to more satisfied customers
  55. Disaster recovery • Disasters can happen from natural or human-induced

    causes • Involves tools, policies and procedures to enable recovery from disasters
  56. Disaster recovery Meet Ark (by Heptio) • Utility for managing

    disaster recovery for Kubernetes cluster resources and persistent volumes • Helps with • Disaster recovery • Cloud provider migration • Clone the production environment for development / testing
  57. Disaster recovery Meet Ark

  58. Disaster recovery Meet Ark • ark restore create [backup-name] [selector]

  59. Thanks! I’ll be around, just say hi if you want

    to talk
  60. Oh, and we are hiring! I’ll be around, just say

    hi if you want to talk
  61. Resources • https://blog.risingstack.com/node-js-logging-tutorial/ • https://nemethgergely.com/nodejs-security-overview/ • https://nodesecurity.io/ • https://wiki.opendaylight.org/view/BestPractices/Logging_ Best_Practices

    • https://blog.risingstack.com/node-js-security-checklist/