Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real world HTTP benchmarking, Lessons learned

Real world HTTP benchmarking, Lessons learned

The TechEmpower Framework Benchmark is a public comparison of more than 200 web frameworks in different languages. The competition is fierce and everyone wants to be ranked in the top!

Eclipse Vert.x is a popular reactive stack for the JVM, designed for highly scalable applications and has taken part in this competition for several years.

Performance benchmarks are often used for comparing HTTP server or web frameworks and often used by people to choose between implementations. We will look at what these benchmarks means and what they actually measure.

The presentation will explain the secret sauce powering Vert.x performance that has a direct impact on this benchmark, from Java just-in-time compiler to networking optimisations.

Cb52062fbd7e159b54e3c298d622fe72?s=128

Julien Viet

November 07, 2019
Tweet

More Decks by Julien Viet

Other Decks in Programming

Transcript

  1. Real world HTTP benchmarking Lessons learned Julien Viet

  2. Julien Viet Open source developer for 16+ years @vertx_project lead

    Principal software engineer at Marseille JUG Leader ! https://www.julienviet.com/ " http://github.com/vietj # @julienviet $ https://www.mixcloud.com/cooperdbi/
  3. None
  4. ✓ 464 frameworks / 26 languages ✓ 5 tests ✓

    Strict requirements ✓ Physical server or cloud ✓ Continuous benchmarking Framework Benchmark https://dzone.com/articles/five-facts-you-might-not-know-about-techempower-fr
  5. 10K ⋆ on " Powered by ! https://vertx.io # @vertx_project

    A toolkit for building reactive applications in Java
  6. 5 free ebook codes to win NOW!!! Tweet and mention

    @vertx_project Get 50% off with tsvertx
  7. Round #8 (2013)

  8. Round #14 (2017)

  9. Round #14

  10. Round #14

  11. Round #14

  12. Round #14

  13. None
  14. ✓ Benchmarking is not a simulation

  15. ✓ Benchmarking is not a simulation ✓ Measure don't guess

  16. ✓ Benchmarking is not a simulation ✓ Measure don't guess

    ✓ Use a baseline
  17. ✓ Benchmarking is not a simulation ✓ Measure don't guess

    ✓ Use a baseline ✓ Define expectations
  18. Tools of the trade async-profiler perf/dtrace JVM logs Flame Graphs

    jitwatch
  19. Plaintext benchmark

  20. Plaintext benchmark ✓ Synchronous static Hello World response ✓ HTTP

    pipelining: 16 ✓ best of (256,...,16384) connections
  21. Batch flushes appropriately ◦ to amortise costs

  22. GET OK GET OK GET OK GET OK OK OK

    GET GET keep-alive pipelining
  23. Default pipelining throughput ✓ Pipelining 1: 59,782 ✓ Pipelining 2:

    74,195 ✓ Pipelining 4: 79,037 ✓ Pipelining 8: 82,122
  24. GET OK OK OK GET GET immediate flush batched flushes

    GET OK OK OK GET GET
  25. Optimised pipelining requests/second ✓ Pipelining 1: 59,782 ◦ 57,123 ✓

    Pipelining 2: 74,195 ◦ 97,329 ✓ Pipelining 4: 79,037 ◦ 217,110 ✓ Pipelining 8: 82,122 ◦ 293,381
  26. Keep your methods small ◦ to ease method inlining

  27. 3% penalty? ad591ec985bcc9f99be173c2ce1c18e350a662f2

  28. Just In Time compilation ✓ Translate Java bytecode to native

    code ✓ Optimise only for the hot path ✓ Kinds of optimisations: method inlining, loop hoisting, dead code elimination, etc...
  29. None
  30. process error

  31. process request process error

  32. process request process body process error

  33. handleError(request)

  34. handleContent((HttpContent) msg) handleError(request)

  35. reduce method size to favour inlining handleContent((HttpContent) msg) handleError(request)

  36. inlining manually handleContent((HttpContent) msg) handleError(request)

  37. Avoid unnecessary allocation ◦ to reduce GC pressure

  38. class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages

    void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(() -> { conn.startRead(); handleMessage(conn, msg); }); } } interface Context { ... void executeFromIo(Runnable handler); ... }
  39. class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages

    void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(() -> { conn.startRead(); handleMessage(conn, msg); }); } } interface Context { ... void executeFromIo(Runnable handler); ... } Instantiate the lambda for each call
  40. class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages

    void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(() -> { conn.startRead(); handleMessage(conn, msg); }); } } interface Context { ... void executeFromIo(Runnable handler); <T> void executeFromIo(T msg, Consumer<T> handler); }
  41. class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages

    void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(msg, message-> { conn.startRead(); handleMessage(conn, message); }); } } interface Context { ... void executeFromIo(Runnable handler); <T> void executeFromIo(T msg, Consumer<T> handler); } Non capturing lambda instantiated once
  42. class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages

    void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(msg, handler); } private Handler<Object> handler = message -> { conn.startRead(); handleMessage(conn, message); }; } Lambda can become a field
  43. None
  44. ✓ Minimize flushing

  45. ✓ Minimize flushing ✓ Optimise for the Just In Time

    compiler
  46. ✓ Minimize flushing ✓ Optimise for the Just In Time

    compiler ✓ Keep GC cool
  47. Round #15

  48. Database benchmarks

  49. Database benchmarks ✓ 4 benchmarks: db, queries, fortunes and updates

    ✓ MySQL, PostgreSQL or MongoDB ✓ 256 connections
  50. At round #14 ✓ JDBC + IkariCP gives best performance

    in Java ✓ Vert.x uses JDBC with a worker pool ✓ Blocking is actually not an issue(???)
  51. Handling the problem ✓ Focus on PostgreSQL ✓ Bad results

    were actually due to mistakes - Missing MongoDB $id for index resulting in bad perf - No usage of a transaction in UPDATES causing abysmal results
  52. The reactive PostgreSQL client ✓ Goals - Simple, clean and

    straightforward API - Non blocking - Performance - Lightweight ✓ Non goals - A driver - An abstraction
  53. query result Round trips to PostgreSQL query result

  54. Pipelining to increase concurrency query result query result query result

  55. Running 5000 queries (100µs ping) Total time (seconds) 0 0,1

    0,2 0,3 0,4 Pipelining level 1 2 4 8 16 JDBC Reactive client
  56. Running 5000 queries (1ms ping) Total time (seconds) 0 3,5

    7 10,5 14 Pipelining level 1 2 4 8 16 JDBC Reactive client
  57. Round #15

  58. CI - Java - unofficial

  59. The reactive SQL client ✓ Part of Vert.x stack since

    3.8 as SQL Client ✓ Support more database - PostgreSQL - MySQL - SQLServer soon
  60. None
  61. None
  62. None
  63. None
  64. None
  65. None
  66. None
  67. Let there be pipelining

  68. What did we learn? ✓ TFB does not favour non-blocking

    designs ✓ JVM is a great place for performance ✓ Trade-offs between usability and performance ✓ RDBMS protocol design is a bottleneck ✓ Protocols concurrency matters
  69. ' TechEmpower Framework Benchmarks https: // www.techempower.com/benchmarks/ ' Reactive PostgreSQL

    Client https: //github.com/eclipse-vertx/vertx-sql-client ' Async Profiler https: //github.com/jvm-profiling-tools/async-profiler ' Flame Graphs https: //github.com/brendangregg/FlameGraph ' Jitwatch https: //github.com/AdoptOpenJDK/jitwatch