Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Profiling the Unexpected in Node.js

Fred K. Schott
February 11, 2015

Performance Profiling the Unexpected in Node.js

Performance testing a Node.js application can be painful. Tooling can be a nightmare to set-up, and online help is almost non-existent. And even with the perfect setup in place... what are you even supposed to be looking for?

This is the position that we found ourselves in last year at Box, and this talk is the story of everything we've learned since. Instead of listing every relevant tool or teaching you how to setup awesome flame graphs, I'm going to share the surprising and unexpected things we learned about Node.js performance at Box, and how we reduced server response times by over 80% along the way. Sometimes just knowing what to look for is half the battle.

Presented at NodeSummit, 2015
Fonts: Lulo Clean, Avenir, DIN

Fred K. Schott

February 11, 2015
Tweet

More Decks by Fred K. Schott

Other Decks in Technology

Transcript

  1. B O X M O B I L E S

    I T E 2013
  2. O N E B I G P H P A

    P P L I C AT I O N M O B I L E S PA G H E T T I B R O K E N C O N T R A C T S W E B A P P A R C H I T E C T U R E C I R C A 2 0 1 3
  3. M O D U L A R I Z E

    T H E M O N O L I T H S T R O N G E R C O N T R A C T S FA S T E R D E V E L O P M E N T A L L J S O N T H E F R O N T- E N D E N T E R N O D E . J S
  4. B O X M O B I L E S

    I T E 2014
  5. T H E S E T U P MOCKED OUT

    API (250MS RESPONSE) ACTS AS AN ACTUAL EXTERNAL SERVER SET RESPONSE TIME, VARIANCE, STABILITY
  6. T H E S E T U P TESTED WITH

    APACHE BENCHMARK 20X CONCURRENCY FOCUS ON 50TH PERCENTILE
  7. T H E S E T U P TESTED A

    SINGLE APPLICATION IN REALITY, MANY ON ONE MACHINE
  8. M . B O X R E L E A

    S E C A N D I D A T E 1 . 0 (20X CONCURRENCY)
  9. M . B O X R E L E A

    S E C A N D I D A T E 1 . 0 2533MS (20X CONCURRENCY)
  10. M A X S O C K E T S

    “DETERMINES HOW MANY CONCURRENT SOCKETS THE AGENT CAN HAVE OPEN PER ORIGIN” — NODE.JS DOCS
  11. M A X S O C K E T S

    D E FA U LT VA L U E IO.JS? INFINITY NODE V0.12? INFINITY NODE V0.10? …5
  12. R E M O V I N G T H

    E M A X S O C K E T S L I M I T var http = require('http'); http.globalAgent.maxSockets = Infinity; // or… var https = require('https'); https.globalAgent.maxSockets = Infinity;
  13. M . B O X R C 1 . 0

    . 1 - / W G L O B A L M A X S O C K E T S F I X ~2533MS (20X CONCURRENCY) NO SIGNIFICANT CHANGE
  14. TIME SECONDS CALLS SYSCALL ------ ----------- --------- ------------- 44.17 0.003208

    3526 EPOLL_WAIT 20.27 0.001472 20037 FUTEX 11.63 0.000845 1363 WRITE 9.42 0.000684 7624 READ $ S T R A C E - C N O D E S E R V E R . J S
  15. TIME SECONDS CALLS SYSCALL ------ ----------- --------- ------------- 44.17 0.003208

    3526 EPOLL_WAIT 20.27 0.001472 20037 FUTEX 11.63 0.000845 1363 WRITE 9.42 0.000684 7624 READ $ S T R A C E - C N O D E S E R V E R . J S
  16. THE API IS ONLY TAKING 250MS… REQUESTS ARE TAKING 10X

    TOO LONG… HALF OF OUR TIME IS SPENT WAITING…
  17. THE API IS ONLY TAKING 250MS… REQUESTS ARE TAKING 10X

    TOO LONG… HALF OF OUR TIME IS SPENT WAITING… REMOVING COMPUTATION DOES NOTHING…
  18. THE API IS ONLY TAKING 250MS… REQUESTS ARE TAKING 10X

    TOO LONG… HALF OF OUR TIME IS SPENT WAITING… REMOVING COMPUTATION DOES NOTHING… THERE HAS TO BE A BOTTLENECK SOMEWHERE…
  19. #6 MOST RELIED UPON NPM MODULE SIMPLIFIES A TON OF

    HTTP REQUEST COMPLEXITY FLEXIBLE, A TON OF OPTIONS R E Q U E S T: T H E G O O D
  20. R E Q U E S T P O O

    L I N G ALLOWS YOU TO REUSE AGENTS KEEP CONNECTIONS OPEN ENABLES CONNECTION POOLING
  21. R E Q U E S T: T H E

    P R O B L E M KEEPS ITS OWN CONNECTION POOL BAD DOCUMENTATION (AT THE TIME) CREATED ITS OWN NEW AGENTS AGENT MAXSOCKETS? 5. OF COURSE.
  22. R E M O V E C O N N

    E C T I O N P O O L I N G A L L T O G E T H E R request({ url: 'https://api.box.com/...', method: 'GET', pool: false });
  23. R E M O V E T H E M

    A X S O C K E T S L I M I T var connectionPool = {maxSockets: Infinity} request({ url: 'https://api.box.com/...', method: 'GET', pool: connectionPool });
  24. M . B O X R C 1 . 1

    - M A X S O C K E T S F I X 1077MS (20X CONCURRENCY) PREVIOUSLY 2533MS 57.5% SPEED UP
  25. • DON’T LEAVE PROFILING TO THE LAST MINUTE • USE

    (AND LEARN) THE TOOLS AT YOUR DISPOSAL • DON'T BE AFRAID TO JUMP IN TO OSS AND HELP • DON'T LEAVE PROBLEMS UNANSWERED K E Y TA K E AWAY S
  26. H A N D L E B A R S

    <div class="entry"> <h1>{{title}}</h1> <div class="body"> {{body}} </div> </div> var context = { title: "M.Box", body: "Welcome to Box!" }; var html = template(context);
  27. H A N D L E B A R S

    <div class="entry"> <h1>{{title}}</h1> <div class="body"> {{body}} </div> </div> var context = { title: "M.Box", body: "Welcome to Box!" }; var html = template(context); ~20MS <1MS
  28. H A N D L E B A R S

    C A C H I N G SAVE THESE FUNCTIONS FOR LATER MUCH FASTER BUT, CAN’T HOT-RELOAD TEMPLATES
  29. M . B O X R C 1 . 2

    - T E M P L A T E C A C H I N G F I X 488MS (20X CONCURRENCY) PREVIOUSLY 1077MS 54.7% SPEED UP
  30. N O D E . J S B E H

    A V I O R NODE IS SINGLE THREADED EVENT DRIVEN ASYNCHRONOUS
  31. N O D E . J S & T H

    E E V E N T L O O P ONE LOOP TO RULE THEM ALL WAIT OUTSIDE THE LOOP RETURN TO THE LOOP ON I/O COMPLETION
  32. N O D E . J S & T H

    E E V E N T L O O P & C O M P U TA T I O N COMPUTATION BLOCKS HOLDS UP THE LOOP HOLDS UP OTHER REQUESTS COMPOUNDING
  33. • I/O IS INSANELY CHEAP • COMPUTATION IS EXPENSIVE •

    KNOW THE STRENGTHS & WEAKNESSES OF YOUR ENVIRONMENT K E Y TA K E AWAY S
  34. M O M E N T. J S LESS THAN

    A MILLISECOND EACH 100 FOR THE ALL FILES PAGE
  35. L I B I C U USES NATIVE I18N LOGIC

    COMPILED INTO THE V8 ENGINE (C++) SUPER FAST
  36. L I B I C U SUPPORT ADDED IN V0.11

    EXISTS IN CHROME TODAY WITHOUT LIBICU: ~8MS WITH LIBICU: ~2MS
  37. • DEATH BY 100 PAPER CUTS • IN HOT CODE

    PATHS, EVERY MS COUNTS • NATIVE FEATURES CAN HAVE THE 
 OPTIMIZED ADVANTAGE OVER USER-LAND LIBRARIES K E Y TA K E AWAY S
  38. W E B K I T- D E V T

    O O L S - A G E N T NOT EASY TO SET UP HARD TO KEEP A CONNECTION PROTIP: DON’T START SETTING UP AT 4PM ON A FRIDAY
  39. ( P R O G R A M ) IDLE

    TIME 93% OF OUR TOTAL TIME
  40. U T I L . F O R M A

    T NICE STRING FORMATTER FOR CONVENIENCE, MOSTLY WASTEFUL IN HOT CODE PATHS
  41. G A R B A G E C O L

    L E C T I O N CAN BE SIGNIFICANT EVENTS BLOCKING! CAN WRECK HAVOC WITH HEALTH CHECKS
  42. G A R B A G E C O L

    L E C T I O N M I T I G A T I O N CONFIGURE SLAB_BUFFER_SIZE SMALLER, MORE FREQUENT CLEANUP
  43. • UNDERSTAND THE PERFORMANCE TRADEOFFS YOU’RE MAKING • BE AWARE

    OF MEMORY USAGE • BE AWARE OF GARBAGE COLLECTION K E Y TA K E AWAY S
  44. M . B O X R C 1 . 2

    488MS (20X CONCURRENCY)
  45. W E R E A L I Z E D

    W E H A D N ’ T B E E N C O M P L E T E LY H O N E S T API IN PRODUCTION? OVER HTTPS API IN TESTS? OVER HTTP NOT A FAIR TEST
  46. M . B O X R C 1 . 2

    / W S S L 607MS (20X CONCURRENCY) PREVIOUSLY 488MS 24.4% SLOW DOWN
  47. M . B O X R C 1 . 2

    / W S S L 935MS (30X CONCURRENCY) PREVIOUSLY 595MS 57.1% SLOW DOWN
  48. SSL

  49. S S L C I P H E R S

    DEFAULTS TO USE DIFFIE-HELLMAN DH IS EXPENSIVE NEED TO INVESTIGATE OTHER CIPHER SUITES, SECURITY TRADEOFFS, ETC.
  50. S S L C I P H E R S

    & C O N N E C T I O N P O O L I N G SLOWDOWN TOO DRAMATIC? SOMETHING COULD BE BLOCKING POOLING NODE V0.12: “PROPER KEEPALIVE SUPPORT”
  51. S S L C I P H E R S

    IN A PERFECT WORLD, NO SSL TALK TO A PRIVATE API HTTPS WOULD BE UNNECESSARY
  52. • ALWAYS BE TRUE TO YOUR LIVE ENV. • UNDERSTAND

    YOUR TECH • SSL = COMPUTATION = HARD ON NODE • PERFORMANCE IS ALL ABOUT TRADEOFFS K E Y TA K E AWAY S
  53. S T R E S S T E S T

    I N G INCREASE TRAFFIC, CONCURRENCY WILL THE SERVERS STAND UP?
  54. 0" 5" 10" 15" 20" 25" 30" 35" 40" 45"

    50" 30" 80" 130" 180" 230" 280" 330" 380" 430" 480" 30    32.82   40    35.60   50    37.33   100  42.90   200  44.35   500  43.88 C O N C U R R E N C Y V S . C O M P L E T E D R E Q U E S T S P E R S E C O N D
  55. V S . C O M P L E T

    E D R E Q U E S T S P E R S E C O N D 10" 15" 20" 25" 30" 35" 40" 45" 50" 2   0   3   0   5   8 APPROACHING MAXIMUM EFFICIENCY
  56. S T R E S S T E S T

    I N G 100% EFFICIENCY AT ~200X PLATEAUED AT ~44 RPS DIDN’T FALL OVER, EVEN AT 500X HANDLED ALL REQUESTS SUCCESSFULLY
  57. • DIDN’T FALL OVER!!! • 100% EFFICIENCY ISN’T FAST OR

    DESIRED • IN PRODUCTION, KEEP A BUFFER K E Y TA K E AWAY S