Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling Node.js apps on production

Profiling Node.js apps on production

A talk about Node.js profiling on production using Linux perf_events and FlameGraph/FlameScope, and a couple of findings

See https://shuheikagawa.com/blog/2018/09/16/node-js-under-a-microscope/ for more details

Shuhei Kagawa

June 06, 2019
Tweet

More Decks by Shuhei Kagawa

Other Decks in Technology

Transcript

  1. 1
    Profiling Node.js apps on
    production

    View Slide

  2. 2
    Shuhei Kagawa
    @shuheikagawa

    Software Engineer
    at Zalando
    Hi, I’m...

    View Slide

  3. 3
    Microservices

    View Slide

  4. 4
    Node.js servers
    with React SSR

    View Slide

  5. 5
    Performance issue

    View Slide

  6. 6
    Mysterious Gap

    API server
    API client
    500 milliseconds

    View Slide

  7. 7
    test env
    production env

    View Slide

  8. 8

    View Slide

  9. 9
    Linux perf

    View Slide

  10. 10
    Small overhead

    View Slide

  11. 11
    JS & native

    View Slide

  12. 12
    node --perf-basic-prof-only-functions

    View Slide

  13. 13
    # Install dependencies for `perf` command
    sudo apt-get install linux-tools-common
    sudo apt-get install linux-tools-$(uname -r)

    View Slide

  14. 14
    # Record stack traces 99 times per second for 30
    seconds
    sudo perf record -F 99 -p ${pid} -g -- sleep 30s
    # Generate human readable stack traces
    sudo perf script > stacks.${pid}.out

    View Slide

  15. 15

    Image from http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
    2,970 Stack Traces!!!

    View Slide

  16. 16

    View Slide

  17. 17
    Image from https://github.com/brendangregg/FlameGraph
    CPU Flame Graph by Brendan Gregg

    View Slide

  18. 18
    CPU Flame Graph
    GZIP
    JSON.parse
    JSON.parse
    React

    View Slide

  19. 19
    Nothing looks so
    wrong…?

    View Slide

  20. 20
    https://github.com/Netflix/flamescope
    FlameScope
    by Netflix cloud performance team

    View Slide

  21. 21
    Finding 1: Metrics Collection from
    Histograms
    Busy for ~1.5s!

    View Slide

  22. 22
    Finding 1: Metrics Collection from
    Histograms
    JSON.stringify()
    JSON.parse()
    in a metrics library

    View Slide

  23. 23
    Finding 2: Garbage Collection
    Busy for ~400ms
    once in ~10s

    View Slide

  24. 24
    Finding 2: Garbage Collection
    Unused fallback cache was
    causing slow GCs

    View Slide

  25. 25

    View Slide

  26. 26
    p99 response time: 50% ⬇

    View Slide

  27. 27

    View Slide

  28. 28
    Be Bold

    View Slide

  29. 29
    Thank you!

    View Slide

  30. 30
    • CPU Flame Graphs
    • FlameScope
    • A sample project
    • How to fix wrong symbols
    • Node.js under a Microscope
    Links

    View Slide