Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Node.js production checklist

Node.js production checklist

At RisingStack, we have helped enterprises to successfully adopt Node.js in the past years. During this period, we have seen a lot of anti-patterns and war stories. Also, we came up with a list of best practices on how to run Node.js applications in production successfully.

This talk addresses both operational, performance and security best practices.

Gergely Nemeth

November 25, 2017
Tweet

More Decks by Gergely Nemeth

Other Decks in Programming

Transcript

  1. Node.js Production Checklist
    Gergely Nemeth
    @nthgergo | nemethgergely.com

    View Slide

  2. Hi, I am Gergely.
    ■ Works at GoDaddy
    ■ Previously RisingStack
    ■ To chat with me:
    – @nthgergo
    [email protected]
    ■ Read my stuff:
    – nemethgergely.com

    View Slide

  3. Node.js Production Checklist
    Gergely Nemeth
    @nthgergo | nemethgergely.com

    View Slide

  4. Why do you need a production
    checklist?

    View Slide

  5. Why do you need a production
    checklist?
    ■ Providing a “three nines” (99,9%) service,
    this is your error budget:
    – 43 minutes of downtime in a month
    – 2.2 hours of downtime in a quarter
    – 8.8 hours of downtime in a year

    View Slide

  6. Why do you need a production
    checklist?
    ■ Providing a “five nines” (99,999%) service
    – 26 seconds of downtime in a month
    – 78 seconds of downtime in a quarter
    – 5 minutes of downtime in a year

    View Slide

  7. Why do you need a production
    checklist?
    ■ These are called SLAs (service level
    agreement)
    ■ SLAs consist of an SLO (service level
    objectives) and penalties if you don’t meet
    them, like the 99% of the requests finish
    under 200 ms

    View Slide

  8. Why do you need a production
    checklist?
    If you are operating a service, you should
    define SLOs for your customers – no matter if
    they are internal or external

    View Slide

  9. Today, you will learn about:
    ■ Error handling in Node.js
    ■ How to secure your Node.js applications
    ■ Building a CI/CD pipeline for Node.js
    ■ How to get full visibility into production systems
    – Best practices for logging
    – Monitoring your Node.js applications
    ■ What to do after hell broke loose

    View Slide

  10. Error handling

    View Slide

  11. Error handling

    View Slide

  12. Error handling
    Async-await changes how we
    write Node.js applications

    View Slide

  13. Error handling
    You have to learn to throw
    again

    View Slide

  14. Error handling - Express
    app.get('/users/:id', (req, res) => {
    const userId = req.params.id
    if (!userId) {
    return res.sendStatus(400).json({
    error: 'Missing id'
    })
    }
    Users.get(userId, (err, user) => {
    if (err) {
    return res.sendStatus(500).json(err)
    }
    res.send(users)
    })
    })

    View Slide

  15. Error handling - Express
    - Errors are handled differently across the
    codebase
    - > Use Express error handlers instead
    - Migrate from callbacks to async-await

    View Slide

  16. Error handling - Express
    app.get('/users/:id', (req, res, next) => {
    const userId = req.params.id
    if (!userId) {
    const missingIdError = new Error('Missing id')
    missingIdError.httpStatusCode = 400
    return next(missingIdError)
    }
    Users.get(userId, (err, user) => {
    if (err) {
    err.httpStatusCode = 500
    return next(err)
    }
    res.send(Users)
    })
    })

    View Slide

  17. Error handling - Express
    app.use((err, req, res, next) => {
    // log the error...
    res.sendStatus(err.httpStatusCode).json(err)
    })
    Add an error handler middleware as the last one

    View Slide

  18. Error handling - Express
    const missingIdError = new Error('Missing id')
    missingIdError.httpStatusCode = 400
    next(missingIdError)



    $

    View Slide

  19. View Slide

  20. Error handling - Express
    const boom = require('boom')
    app.get('/users/:id', (req, res, next) => {
    const userId = req.params.id
    if (!userId) {
    return next(boom.badRequest('missing id'))
    }
    Users.get(userId, (err, user) => {
    if (err) {
    return next(boom.badImplementation(err))
    }
    res.send(Users)
    })
    })

    View Slide

  21. Error handling - Express
    const asyncMiddleware = fn => (req, res, next) => {
    Promise.resolve(fn(req, res, next)).catch(next);
    };
    module.exports = exports = asyncMiddleware;
    Wrapping the Express route handlers for
    proper error propagation

    View Slide

  22. Error handling – Express
    The modified error handler
    app.use((err, req, res, next) => {
    if (err.isServer) {
    // log the error...
    // probably you don't want to log unauthorized access
    // or do you?
    }
    return res.status(err.output.statusCode).json(err.output.payload);
    })

    View Slide

  23. Error handling - Express
    app.get('/users/:id', asyncMw((req, res) => {
    const userId = req.params.id
    if (!userId) {
    throw boom.badRequest('missing id')
    }
    const users = await getUserAsync(userId)
    res.json(users)
    }))

    View Slide

  24. Error handling - Express
    app.get('/users/:id', asyncMw((req, res) => {
    const userId = req.params.id
    if (!userId) {
    throw boom.badRequest('missing id')
    }
    const users = await getUserAsync(userId)
    res.json(users)
    }))
    app.get('/users/:id', (req, res) => {
    const userId = req.params.id
    if (!userId) {
    return res.sendStatus(400).json({
    error: 'Missing id'
    })
    }
    Users.get(userId, (err, user) => {
    if (err) {
    return res.sendStatus(500).json(err)
    }
    res.send(users)
    })
    })

    View Slide

  25. Error handling - Express
    • Have unified error handling
    • Start using async-await with throw and try-catch

    View Slide

  26. Securing your application

    View Slide

  27. Securing your application

    View Slide

  28. Securing your application
    Enable two-factor authentication for
    your npm account

    View Slide

  29. Securing your application
    More than 500,000 modules in
    the registry

    View Slide

  30. Securing your application

    View Slide

  31. Securing your application

    View Slide

  32. Securing your application

    View Slide

  33. Securing your application
    • Node Security Project: security advisories
    • Snyk: Vulnerability DB

    View Slide

  34. Securing you application
    Security HTTP header
    • X-Frame-Options to mitigates clickjacking attacks,
    • Strict-Transport-Security to keep your users on HTTPS,
    • X-XSS-Protection to prevent reflected XSS attacks,
    • X-DNS-Prefetch-Control to disable browsers’ DNS
    prefetching.

    View Slide

  35. Securing you application
    Security HTTP header
    const express = require('express')
    const helmet = require('helmet')
    const app = express()
    app.use(helmet())

    View Slide

  36. Securing you application
    Validating user input
    const Joi = require('joi');
    const schema = Joi.object().keys({
    username: Joi.string().alphanum().min(3).max(30).required(),
    access_token: [Joi.string(), Joi.number()],
    birthyear: Joi.number().integer().min(1900).max(2017),
    email: Joi.string().email()
    }).with('username', 'birthyear')
    // Return result
    const result = Joi.validate({
    username: 'abc',
    birthyear: 1994
    }, schema)
    // result.error === null -> valid

    View Slide

  37. Logging best practices

    View Slide

  38. Logging best practices
    Requirements
    • Timestamps to know when a given event happened,
    • Format to keep log lines readable for both humans and
    machines,
    • Destination should be the standard output and error only,
    • Support for log levels

    View Slide

  39. Logging best practices
    Using log levels
    • Error
    • General errors, always reported
    • Used whenever an unexpected error happens which
    prevents further processing
    • The app may try to recover (like on database connection
    lost) or forcefully terminate

    View Slide

  40. Logging best practices
    Using log levels
    • Warn
    • For events indicating irregular circumstances, with
    clearly defined recovery strategy
    • It has no impact on system availability or performance
    • These events should be reported too

    View Slide

  41. Logging best practices
    Using log levels
    • Info
    • These events indicate major state changes in the
    application, like the startup of the HTTP server
    • Each component should log:
    • When it starts and when it became operational
    • When it started shutdown, and just before it stopped

    View Slide

  42. Logging best practices
    Using log levels
    • Debug
    • Diagnostical level events, for internal state changes
    • These events are usually not reported, just for
    troubleshooting
    • At the discretion of the engineer developing the system
    component

    View Slide

  43. Logging best practices
    Example
    logger.info('server started initializing ')
    const app = express()
    app.get('/', (req, res) => {
    res.send('ok!')
    })
    app.listen(PORT, (err) => {
    if (err) {
    return logger.error(err)
    }
    logger.info(`server started listening on ${PORT}`)
    })

    View Slide

  44. Logging best practices
    Log the original URL of the request for errors
    app.use((err, req, res, next) => {
    if (err.isServer) {
    logger.error(err.message, {
    stack: err.stack,
    originalUrl: req.originalUrl
    })
    }
    return res.status(err.output.statusCode).json(err.output.payload);
    })

    View Slide

  45. Building a CICD pipeline

    View Slide

  46. Building a CICD pipeline
    • Protect the master branch
    • Only approved pull requests can be merge
    • Contributors with write access can approve

    View Slide

  47. Building a CICD pipeline
    • Run automated checks on your codebase:
    • Linting
    • Unit, integration, functional tests
    • Code coverage

    View Slide

  48. Building a CICD pipeline
    • With good test coverage
    • You can deploy to production on every commit
    • Reduced risk
    • Faster reaction times
    • Flexible release options

    View Slide

  49. Monitoring your applications

    View Slide

  50. Monitoring your applications
    Most important metrics to watch
    • Error rate, as they directly affect customer
    satisfaction;
    • Latency, as the slower the service, the most likely your
    customers close your application;
    • Throughput, to put error rate and latency in context;
    • Saturation, to tell if you can handle more traffic.

    View Slide

  51. Monitoring your applications
    How your app is doing?

    View Slide

  52. Monitoring your applications
    Aggregated metrics have little value

    View Slide

  53. Monitoring your applications
    You want to have route-level metrics
    This way of monitoring is called white box monitoring
    Tools like Prometheus, New Relic, Opbeat or Dynatrace can help to implement it.

    View Slide

  54. Monitoring your applications
    Can users access it?
    This way of monitoring is called black box monitoring
    Tools like Pingdom or ping.apex.sh provides these services.

    View Slide

  55. Incident handling

    View Slide

  56. Incident handling
    You build it, you run it

    View Slide

  57. Incident handling
    You build it, you run it
    • Teams will think about how their software is going to run
    in production
    • Encourages ownership and accountability which leads to
    more independent, responsible teammates
    • Leads to operational excellence
    • Which leads to more satisfied customers

    View Slide

  58. Incident handling
    When all hell breaks loose
    • Have a “War Room” / “On call” channel in your instant
    messaging system
    • When the issue happens, notify your customers through a
    status page, keep it up-to-date through the investigation
    • Do post-mortems on what led to the issue, and what
    actions will be taken to prevent it in the future

    View Slide

  59. Incident handling
    The human aspect
    • At least 3 people for a 24/7 schedule
    • With distributed teams you can minimize night shifts
    • Set up a pager with escalation policies
    • Have a secondary team on standby

    View Slide

  60. Incident handling
    The human aspect
    • A fair number of is one or two issues per on-call shift
    • With more, you risk:
    • Burning-out your team
    • Decreasing the work quality because of the focus lost
    • With less, they may lose touch with the production system

    View Slide

  61. Summary
    • How to handler errors is Node.js using async-await
    • The most important tasks automated by a CI/CD pipeline
    • Logging and monitoring best practices
    • Incident handling

    View Slide

  62. Thanks!
    I’ll be around, just say hi if you want
    to talk

    View Slide

  63. Oh, and we are
    hiring!
    I’ll be around, just say hi if you want to talk
    https://careers.godaddy.net/

    View Slide

  64. Resources
    • https://blog.risingstack.com/node-js-logging-tutorial/
    • https://nemethgergely.com/nodejs-security-overview/
    • https://nodesecurity.io/
    • https://wiki.opendaylight.org/view/BestPractices/Logging_
    Best_Practices
    • https://blog.risingstack.com/node-js-security-checklist/

    View Slide