Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real-world HTTP performance benchmarking, lessons learned

Real-world HTTP performance benchmarking, lessons learned

The Techempower Framework Benchmark is a public comparison of more than 200 web frameworks in different languages. The competition is fierce and everyone wants to be ranked in the top!

Eclipse Vert.x is a popular reactive stack for the JVM, designed for highly scalable applications and has taken part in this competition for several years.

Performance benchmarks are often used for comparing HTTP server or web frameworks and often used by people to choose between implementations. We will look at what these benchmarks means and what they actually measure.

The presentation will explain the secret sauce powering Vert.x performance that has a direct impact on this benchmark, from Java just-in-time compiler to networking optimizations.

Cb52062fbd7e159b54e3c298d622fe72?s=128

Julien Viet

October 18, 2018
Tweet

More Decks by Julien Viet

Other Decks in Programming

Transcript

  1. Real-world HTTP performance benchmarking, lessons learned Julien Viet QCon Shangai

    18-10-2018
  2. Once upon a time

  3. Round #8

  4. Every one was happy

  5. But one day...

  6. Round #14

  7. Round #14

  8. Round #14

  9. Round #14

  10. Round #14

  11. Real-world HTTP performance benchmarking, lessons learned

  12. Julien Viet Open source developer for 16+ years @vertx_project lead

    Principal software engineer at Marseille JUG Leader ! https://www.julienviet.com/ " http://github.com/vietj # @julienviet  https://www.mixcloud.com/cooperdbi/
  13. Eclipse Vert.x Open source project started in 2012 Eclipse /

    Apache licensing A toolkit for building reactive applications for the JVM 8K ⋆ on " Built on top of ! https://vertx.io # @vertx_project
  14. Techempower Framework Benchmark ✓ Performance of production grade deployments of

    real-world application frameworks and platforms ✓ 464 frameworks - 26 languages ✓ Community of contributors on GitHub ✓ Physical server or cloud (Azure)
  15. 6 benchmarks ✓ "/plaintext", "/json" ✓ "/db", "/queries", "/updates", "/fortunes"

  16. Things to remember ✓ Benchmarking is hard ✓ Benchmarking is

    NOT load testing ✓ Measure don't guess ✓ Be critic
  17. The lab

  18. None
  19. None
  20. None
  21. /plaintext

  22. Benchmark ✓ Simple Hello World ✓ 16,384 concurrent connections ✓

    HTTP pipelining (16) ✓ No back-end ✓ Heavily CPU bound
  23. GET OK PUT OK GET OK Keep-alive

  24. Head of line blocking

  25. PUT OK GET OK GET OK Pipelining

  26. Our weapons ✓ Async-profiler with Flame graphs ✓ Jitwatch ✓

    Wireshark
  27. Code inlining

  28. process request process body process error

  29. process request process body process error

  30. process request process body process error reduce method size to

    favor inlining
  31. process request process body process error 2. inline by hand

    b2073fa091d64a1dfe06699bca1a8befddb5a805
  32. Batch to amortise costs

  33. // class VertxHandler void channelRead(Object msg) { Connection conn =

    getConnection(); Context ctx = conn.getContext(); context.executeFromIO(conn::startRead()); channelRead(conn, msg); } // class VertxHttpHandler extends VertxHandler void channelRead(Connection conn, Object msg) { context.executeFromIO(() -> { conn.handleMessage(msg); }); } chctx.fireChannelRead(msg) void startRead() { ... } void handleMessage(Object msg) { ... }
  34. // class VertxHandler void channelRead(Object msg) { Connection conn =

    getConnection(); Context ctx = conn.getContext(); context.executeFromIO(conn::startRead()); channelRead(conn, msg); } // class VertxHttpHandler extends VertxHandler void channelRead(Connection conn, Object msg) { context.executeFromIO(() -> { conn.handleMessage(msg); }); } chctx.fireChannelRead(msg) void startRead() { ... } void handleMessage(Object msg) { ... } Can't be inlined
  35. // class VertxHandler public void channelRead(ChannelHandlerContext chctx, Object msg) {

    Connection conn = getConnection(); Context ctx = conn.getContext(); context.executeFromIO(() -> { conn.startRead(); conn.handleMessage(msg); }); } chctx.fireChannelRead(msg) void startRead() { } void handleMessage(Object msg) { ... } Batch here 799df9e602eabcd51b56052e20cc7d05134ff901
  36. The fastest code is the code that never runs

  37. req.response() .end("Hello World"); Netty Vert.x Application

  38. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    req.response() .end("Hello World"); Netty Vert.x Application
  39. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; channel.write(encode(obj)); } req.response() .end("Hello World"); Netty Vert.x Application
  40. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; channel.write(encode(obj)); } ChannelFuture write(Object msg) { return pipeline.write(msg); } req.response() .end("Hello World"); Netty Vert.x Application
  41. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; channel.write(encode(obj)); } // default implementation (inherited) void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) { ctx.write(msg, promise); } ChannelFuture write(Object msg) { return pipeline.write(msg); } req.response() .end("Hello World"); Netty Vert.x Application
  42. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; channel.write(encode(obj)); } // default implementation (inherited) void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) { ctx.write(msg, promise); } ChannelFuture write(Object msg) { return pipeline.write(msg); } void write(Object msg, ChannelPromise promise) { next.invoke(msg, promise) } req.response() .end("Hello World"); Netty Vert.x Application
  43. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; channel.write(encode(obj)); } // default implementation (inherited) void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) { ctx.write(msg, promise); } ChannelFuture write(Object msg) { return pipeline.write(msg); } void write(Object msg, ChannelPromise promise) { next.invoke(msg, promise) } req.response() .end("Hello World"); Netty Vert.x Application
  44. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; chctx.write(encode(obj)); } void write(Object msg, ChannelPromise promise) { next.invoke(msg, promise) } req.response() .end("Hello World"); The fastest code is the code that never runs 217b17c78cd54103ae98557510a7ac431e17c5ea Netty Vert.x Application
  45. Reduce object allocation

  46. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; chctx.write(encode(obj)); } void write(Object msg) { write(msg, newPromise()); } void write(Object msg, ChannelPromise promise) { next.invoke(msg, promise) } req.response() .end("Hello World"); Netty Vert.x Application
  47. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; chctx.write(encode(obj)); } void write(Object msg) { write(msg, newPromise()); } void write(Object msg, ChannelPromise promise) { next.invoke(msg, promise) } req.response() .end("Hello World"); Netty Vert.x Application Allocates a promise that is never used
  48. void end(Buffer buffer) { FullHttpResponse msg = ... queueForWrite(msg); }

    void queueForWrite(Object msg) { needsFlush = true; chctx.write(obj, channel.voidPromise()); } void write(Object msg, ChannelPromise promise) { next.invoke(msg, promise) } req.response() .end("Hello World"); Netty Vert.x Use a singleton VoidPromise instead 6b9788dec6e1147782a3a7017ead067778095cba Application
  49. Cache expensive operations

  50. void setConnection(Connection conn) { this.conn = conn; } void channelReadComplete(ChannelHandlerContext

    ctx) { Runnable task = conn::endReadAndFlush(); // Need to use executeFromIO to avoid race conditions context.executeFromIO(task); } void endReadAndFlush() { if (needFlush) { needFlush = false; channel.flush(); } } Vert.x
  51. void setConnection(Connection conn) { this.conn = conn; } void channelReadComplete(ChannelHandlerContext

    ctx) { Runnable task = conn::endReadAndFlush(); // Need to use executeFromIO to avoid race conditions context.executeFromIO(task); } void endReadAndFlush() { if (needFlush) { needFlush = false; channel.flush(); } } Vert.x Instantiate the lambda for each flush
  52. void setConnection(Connection conn) { this.conn = conn; this.task = conn::endReadAndFlush();

    } void channelReadComplete(ChannelHandlerContext ctx) { Runnable task = conn::endReadAndFlush(); // Need to use executeFromIO to avoid race conditions context.executeFromIO(task); } void endReadAndFlush() { if (needFlush) { needFlush = false; channel.flush(); } } Create the lambda when the connection is created, once Vert.x
  53. Extra optimisations ✓ Faster HTTP header encoding ✓ Cache complex

    conditions
  54. Round #15

  55. /db benchmark

  56. /db ✓ Choice to use PostgreSQL ✓ Determine the actual

    bottleneck: CPU ? Network ? Database ? ✓ 256 concurrent connections: non-blocking versus blocking
  57. The reactive PostgreSQL client ✓ Goals - Simple, clean and

    straightforward API - Performant - Be a client - Lightweight ✓ Non goals - Be a driver - Be an abstraction
  58. // Connect directly PgClient.connect(uri, connection -> { // Handle result

    }); // Or create a pool of connections PgClient pool = PgClient.pool(uri); pool.getConnection(connection -> { // Handle result });
  59. // Sequential queries connection.query(query1, result1 -> { // Got result

    1 connection.query(query2, result2 -> { // Got result 2 }); });
  60. // What if we do ? connection.query(query1, result1 -> {

    // Got result 1 }); connection.query(query2, result2 -> { // Got result 2 }); - the 2 queries executes concurrently ? - query1 executes then query2 ? - query1 executes, query2 executes after ? QUIZ
  61. Head of line blocking ✓ PostgreSQL process one request at

    a time ✓ Send the response after processing ✓ Sounds familiar ?
  62. Let's pipeline it!

  63. Other cool features ✓ Direct memory to object without intermediary

    memory copy ✓ Efficient flush to minimise expensive system calls ✓ RxJava 1 & 2 ✓ Domain sockets / SSL / Proxy
  64. Round #15

  65. None
  66. None
  67. None
  68. None
  69. None
  70. None
  71. None
  72. Let there be pipelining

  73. None
  74. None
  75. ✓ https://github.com/AdoptOpenJDK/jitwatch ✓ https://github.com/jvm-profiling-tools/async-profiler ✓ https://reactiverse.io/reactive-pg-client/