Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Node.js Production Checklist - SFNode

Node.js Production Checklist - SFNode

Gergely Nemeth

February 01, 2018
Tweet

More Decks by Gergely Nemeth

Other Decks in Programming

Transcript

  1. Node.js Production Checklist
    Gergely Nemeth
    @nthgergo | nemethgergely.com

    View full-size slide

  2. Hi, I am Gergely.
    ■ Works at GoDaddy here
    ■ Previously RisingStack
    ■ To chat with me:
    – @nthgergo
    [email protected]
    ■ Read my stuff:
    – nemethgergely.com

    View full-size slide

  3. Why do you need a production
    checklist?

    View full-size slide

  4. Why do you need a production
    checklist?
    ■ Providing a “three nines” (99,9%) service,
    this is your error budget:
    – 43 minutes of downtime in a month
    – 2.2 hours of downtime in a quarter
    – 8.8 hours of downtime in a year

    View full-size slide

  5. Why do you need a production
    checklist?
    ■ Providing a “five nines” (99,999%) service
    – 26 seconds of downtime in a month
    – 78 seconds of downtime in a quarter
    – 5 minutes of downtime in a year

    View full-size slide

  6. Why do you need a production
    checklist?
    ■ These are called SLAs (service level
    agreement)
    ■ SLAs consist of an SLO (service level
    objectives) and penalties if you don’t meet
    them, like the 99% of the requests finish
    under 200 ms

    View full-size slide

  7. Why do you need a production
    checklist?
    If you are operating a service, you should
    define SLOs for your customers – no matter if
    they are internal or external

    View full-size slide

  8. Today, you will learn about:
    ■ Error handling in Node.js
    ■ How to secure your Node.js applications
    ■ How to get full visibility into production systems
    – Best practices for logging
    – Monitoring your Node.js applications
    ■ What to do after hell broke loose

    View full-size slide

  9. Error handling

    View full-size slide

  10. Error handling
    Async-await changes how we
    write Node.js applications

    View full-size slide

  11. Error handling
    You have to learn to throw
    again

    View full-size slide

  12. Error handling - Express
    app.get('/users/:id', (req, res) => {
    const userId = req.params.id
    if (!userId) {
    return res.sendStatus(400).json({
    error: 'Missing id'
    })
    }
    Users.get(userId, (err, user) => {
    if (err) {
    return res.sendStatus(500).json(err)
    }
    res.send(users)
    })
    })

    View full-size slide

  13. Error handling - Express
    - Errors are handled differently across the
    codebase
    - > Use Express error handlers instead
    - Migrate from callbacks to async-await

    View full-size slide

  14. Error handling - Express
    app.get('/users/:id', (req, res, next) => {
    const userId = req.params.id
    if (!userId) {
    const missingIdError = new Error('Missing id')
    missingIdError.httpStatusCode = 400
    return next(missingIdError)
    }
    Users.get(userId, (err, user) => {
    if (err) {
    err.httpStatusCode = 500
    return next(err)
    }
    res.send(Users)
    })
    })

    View full-size slide

  15. Error handling - Express
    app.use((err, req, res, next) => {
    // log the error...
    res.sendStatus(err.httpStatusCode).json(err)
    })
    Add an error handler middleware as the last one

    View full-size slide

  16. Error handling - Express
    const missingIdError = new Error('Missing id')
    missingIdError.httpStatusCode = 400
    next(missingIdError)



    $

    View full-size slide

  17. Error handling - Express
    const boom = require('boom')
    app.get('/users/:id', (req, res, next) => {
    const userId = req.params.id
    if (!userId) {
    return next(boom.badRequest('missing id'))
    }
    Users.get(userId, (err, user) => {
    if (err) {
    return next(boom.badImplementation(err))
    }
    res.send(Users)
    })
    })

    View full-size slide

  18. Error handling - Express
    const asyncMiddleware = fn => (req, res, next) => {
    Promise.resolve(fn(req, res, next)).catch(next)
    }
    module.exports = exports = asyncMiddleware
    Wrapping the Express route handlers for
    proper error propagation

    View full-size slide

  19. Error handling - Express
    app.get('/users/:id', asyncMw((req, res) => {
    const userId = req.params.id
    if (!userId) {
    throw boom.badRequest('missing id')
    }
    const users = await getUserAsync(userId)
    res.json(users)
    }))

    View full-size slide

  20. Error handling – Express
    The modified error handler
    app.use((err, req, res, next) => {
    if (err.isServer) {
    // log the error...
    // probably you don't want to log unauthorized access
    // or do you?
    }
    return res.status(err.output.statusCode).json(err.output.payload)
    })

    View full-size slide

  21. Error handling - Express
    app.get('/users/:id', asyncMw((req, res) => {
    const userId = req.params.id
    if (!userId) {
    throw boom.badRequest('missing id')
    }
    const users = await getUserAsync(userId)
    res.json(users)
    }))
    app.get('/users/:id', (req, res) => {
    const userId = req.params.id
    if (!userId) {
    return res.sendStatus(400).json({
    error: 'Missing id'
    })
    }
    Users.get(userId, (err, user) => {
    if (err) {
    return res.sendStatus(500).json(err)
    }
    res.send(users)
    })
    })

    View full-size slide

  22. Securing your application

    View full-size slide

  23. Securing your application

    View full-size slide

  24. Securing your application
    Enable two-factor authentication for
    your npm account

    View full-size slide

  25. Securing your application
    More than 575,000 modules in
    the registry

    View full-size slide

  26. Securing your application

    View full-size slide

  27. Securing your application

    View full-size slide

  28. Securing your application

    View full-size slide

  29. Securing your application
    You are what you require

    View full-size slide

  30. Securing your application
    • Node Security Project: security advisories
    • Snyk: Vulnerability DB

    View full-size slide

  31. Securing you application
    Security HTTP header
    • X-Frame-Options to mitigates clickjacking attacks,
    • Strict-Transport-Security to keep your users on HTTPS,
    • X-XSS-Protection to prevent reflected XSS attacks,
    • X-DNS-Prefetch-Control to disable browsers’ DNS
    prefetching.

    View full-size slide

  32. Securing you application
    Security HTTP header
    const express = require('express')
    const helmet = require('helmet')
    const app = express()
    app.use(helmet())

    View full-size slide

  33. Securing you application
    Validating user input
    const Joi = require('joi');
    const schema = Joi.object().keys({
    username: Joi.string().alphanum().min(3).max(30).required(),
    access_token: [Joi.string(), Joi.number()],
    birthyear: Joi.number().integer().min(1900).max(2017),
    email: Joi.string().email()
    }).with('username', 'birthyear')
    // Return result
    const result = Joi.validate({
    username: 'abc',
    birthyear: 1994
    }, schema)
    // result.error === null -> valid

    View full-size slide

  34. Logging best practices

    View full-size slide

  35. Logging best practices
    Requirements
    • Timestamps to know when a given event happened,
    • Format to keep log lines readable for both humans and
    machines,
    • Destination should be the standard output and error only,
    • Support for log levels

    View full-size slide

  36. Logging best practices
    Using log levels
    • Error
    • General errors, always reported
    • Used whenever an unexpected error happens which
    prevents further processing
    • The app may try to recover (like on database connection
    lost) or forcefully terminate

    View full-size slide

  37. Logging best practices
    Using log levels
    • Warn
    • For events indicating irregular circumstances, with
    clearly defined recovery strategy
    • It has no impact on system availability or performance
    • These events should be reported too

    View full-size slide

  38. Logging best practices
    Using log levels
    • Info
    • These events indicate major state changes in the
    application, like the startup of the HTTP server
    • Each component should log:
    • When it starts and when it became operational
    • When it started shutdown, and just before it stopped

    View full-size slide

  39. Logging best practices
    Using log levels
    • Debug
    • Diagnostical level events, for internal state changes
    • These events are usually not reported, just for
    troubleshooting
    • At the discretion of the engineer developing the system
    component

    View full-size slide

  40. Logging best practices
    Example
    loggel.info('server started initializing ')
    const app = express()
    app.get('/', (req, res) => {
    res.send('ok!')
    })
    app.listen(PORT, (err) => {
    if (err) {
    return logger.error(err)
    }
    logger.info(`server started listening on ${PORT}`)
    })

    View full-size slide

  41. Logging best practices
    Log the original URL of the request for errors
    app.use((err, req, res, next) => {
    if (err.isServer) {
    logger.error(err.message, {
    stack: err.stack,
    originalUrl: req.originalUrl
    })
    }
    return res.status(err.output.statusCode).json(err.output.payload);
    })

    View full-size slide

  42. Graceful shutdown
    • When deploying new versions of your application, you will
    replace old versions
    • Listen on SIGTERM
    • Stop accepting new requests
    • Serve ongoing requests
    • Clean up the resources your app used

    View full-size slide

  43. Graceful shutdown
    Meet Terminus
    • Provides
    • Graceful shutdown,
    • Health checks for HTTP applications
    • Works with any Node.js HTTP servers
    • Built on stoppable
    • github.com/godaddy/terminus

    View full-size slide

  44. Graceful shutdown
    Meet Terminus

    View full-size slide

  45. Monitoring your applications

    View full-size slide

  46. Monitoring your applications
    Most important metrics to watch
    • Error rate, as they directly affect customer
    satisfaction;
    • Latency, as the slower the service, the most likely your
    customers close your application;
    • Throughput, to put error rate and latency in context;
    • Saturation, to tell if you can handle more traffic.

    View full-size slide

  47. Monitoring your applications
    How your app is doing?

    View full-size slide

  48. Monitoring your applications
    Aggregated metrics have little value

    View full-size slide

  49. Monitoring your applications
    You want to have route-level metrics
    This way of monitoring is called white box monitoring
    Tools like Prometheus, New Relic, Opbeat or Dynatrace can help to implement it.

    View full-size slide

  50. Monitoring your applications
    Can users access it?
    This way of monitoring is called black box monitoring
    Tools like Pingdom or ping.apex.sh provides these services.

    View full-size slide

  51. Incident handling

    View full-size slide

  52. Incident handling
    You build it, you run it

    View full-size slide

  53. Incident handling
    You build it, you run it
    • Teams will think about how their software is going to run
    in production
    • Encourages ownership and accountability which leads to
    more independent, responsible teammates
    • Leads to operational excellence
    • Which leads to more satisfied customers

    View full-size slide

  54. Disaster recovery
    • Disasters can happen from natural or human-induced causes
    • Involves tools, policies and procedures to enable
    recovery from disasters

    View full-size slide

  55. Disaster recovery
    Meet Ark (by Heptio)
    • Utility for managing disaster recovery for Kubernetes cluster
    resources and persistent volumes
    • Helps with
    • Disaster recovery
    • Cloud provider migration
    • Clone the production environment for development / testing

    View full-size slide

  56. Disaster recovery
    Meet Ark

    View full-size slide

  57. Disaster recovery
    Meet Ark
    • ark restore create [backup-name] [selector]

    View full-size slide

  58. Thanks!
    I’ll be around, just say hi if you want
    to talk

    View full-size slide

  59. Oh, and we are
    hiring!
    I’ll be around, just say hi if you want
    to talk

    View full-size slide

  60. Resources
    • https://blog.risingstack.com/node-js-logging-tutorial/
    • https://nemethgergely.com/nodejs-security-overview/
    • https://nodesecurity.io/
    • https://wiki.opendaylight.org/view/BestPractices/Logging_
    Best_Practices
    • https://blog.risingstack.com/node-js-security-checklist/

    View full-size slide