Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Real world HTTP benchmarking, Lessons learned

Julien Viet
November 07, 2019

Real world HTTP benchmarking, Lessons learned

The TechEmpower Framework Benchmark is a public comparison of more than 200 web frameworks in different languages. The competition is fierce and everyone wants to be ranked in the top!

Eclipse Vert.x is a popular reactive stack for the JVM, designed for highly scalable applications and has taken part in this competition for several years.

Performance benchmarks are often used for comparing HTTP server or web frameworks and often used by people to choose between implementations. We will look at what these benchmarks means and what they actually measure.

The presentation will explain the secret sauce powering Vert.x performance that has a direct impact on this benchmark, from Java just-in-time compiler to networking optimisations.

Julien Viet

November 07, 2019
Tweet

More Decks by Julien Viet

Other Decks in Programming

Transcript

  1. Real world HTTP benchmarking
    Lessons learned
    Julien Viet

    View full-size slide

  2. Julien Viet
    Open source developer for 16+ years
    @vertx_project lead
    Principal software engineer at
    Marseille JUG Leader
    ! https://www.julienviet.com/
    " http://github.com/vietj
    # @julienviet
    $ https://www.mixcloud.com/cooperdbi/

    View full-size slide

  3. ✓ 464 frameworks / 26 languages
    ✓ 5 tests
    ✓ Strict requirements
    ✓ Physical server or cloud
    ✓ Continuous benchmarking
    Framework Benchmark
    https://dzone.com/articles/five-facts-you-might-not-know-about-techempower-fr

    View full-size slide

  4. 10K ⋆ on "
    Powered by
    ! https://vertx.io
    # @vertx_project
    A toolkit for building reactive
    applications in Java

    View full-size slide

  5. 5 free ebook
    codes to win NOW!!!
    Tweet and mention
    @vertx_project
    Get 50% off
    with tsvertx

    View full-size slide

  6. Round #8 (2013)

    View full-size slide

  7. Round #14 (2017)

    View full-size slide

  8. ✓ Benchmarking is not a
    simulation

    View full-size slide

  9. ✓ Benchmarking is not a
    simulation
    ✓ Measure don't guess

    View full-size slide

  10. ✓ Benchmarking is not a
    simulation
    ✓ Measure don't guess
    ✓ Use a baseline

    View full-size slide

  11. ✓ Benchmarking is not a
    simulation
    ✓ Measure don't guess
    ✓ Use a baseline
    ✓ Define expectations

    View full-size slide

  12. Tools of the trade
    async-profiler
    perf/dtrace
    JVM logs
    Flame Graphs
    jitwatch

    View full-size slide

  13. Plaintext
    benchmark

    View full-size slide

  14. Plaintext benchmark
    ✓ Synchronous static Hello World response
    ✓ HTTP pipelining: 16
    ✓ best of (256,...,16384) connections

    View full-size slide

  15. Batch flushes appropriately
    ○ to amortise costs

    View full-size slide

  16. GET
    OK
    GET
    OK
    GET
    OK
    GET
    OK
    OK
    OK
    GET
    GET
    keep-alive pipelining

    View full-size slide

  17. Default pipelining throughput
    ✓ Pipelining 1: 59,782
    ✓ Pipelining 2: 74,195
    ✓ Pipelining 4: 79,037
    ✓ Pipelining 8: 82,122

    View full-size slide

  18. GET
    OK OK OK
    GET
    GET
    immediate flush batched flushes
    GET
    OK
    OK
    OK
    GET
    GET

    View full-size slide

  19. Optimised pipelining requests/second
    ✓ Pipelining 1: 59,782 ○ 57,123
    ✓ Pipelining 2: 74,195 ○ 97,329
    ✓ Pipelining 4: 79,037 ○ 217,110
    ✓ Pipelining 8: 82,122 ○ 293,381

    View full-size slide

  20. Keep your methods small
    ○ to ease method inlining

    View full-size slide

  21. 3% penalty?
    ad591ec985bcc9f99be173c2ce1c18e350a662f2

    View full-size slide

  22. Just In Time compilation
    ✓ Translate Java bytecode to native code
    ✓ Optimise only for the hot path
    ✓ Kinds of optimisations: method inlining, loop hoisting,
    dead code elimination, etc...

    View full-size slide

  23. process
    error

    View full-size slide

  24. process
    request
    process
    error

    View full-size slide

  25. process
    request
    process
    body
    process
    error

    View full-size slide

  26. handleError(request)

    View full-size slide

  27. handleContent((HttpContent) msg)
    handleError(request)

    View full-size slide

  28. reduce method size
    to favour inlining
    handleContent((HttpContent) msg)
    handleError(request)

    View full-size slide

  29. inlining manually
    handleContent((HttpContent) msg)
    handleError(request)

    View full-size slide

  30. Avoid unnecessary allocation
    ○ to reduce GC pressure

    View full-size slide

  31. class VertxHandler extends ChannelDuplexHandler {
    ...
    // Process Netty's messages
    void channelRead(ChannelHandlerContext ctx, Object msg) {
    context.executeFromIO(() -> {
    conn.startRead();
    handleMessage(conn, msg);
    });
    }
    }
    interface Context {
    ...
    void executeFromIo(Runnable handler);
    ...
    }

    View full-size slide

  32. class VertxHandler extends ChannelDuplexHandler {
    ...
    // Process Netty's messages
    void channelRead(ChannelHandlerContext ctx, Object msg) {
    context.executeFromIO(() -> {
    conn.startRead();
    handleMessage(conn, msg);
    });
    }
    }
    interface Context {
    ...
    void executeFromIo(Runnable handler);
    ...
    }
    Instantiate
    the lambda
    for each call

    View full-size slide

  33. class VertxHandler extends ChannelDuplexHandler {
    ...
    // Process Netty's messages
    void channelRead(ChannelHandlerContext ctx, Object msg) {
    context.executeFromIO(() -> {
    conn.startRead();
    handleMessage(conn, msg);
    });
    }
    }
    interface Context {
    ...
    void executeFromIo(Runnable handler);
    void executeFromIo(T msg, Consumer handler);
    }

    View full-size slide

  34. class VertxHandler extends ChannelDuplexHandler {
    ...
    // Process Netty's messages
    void channelRead(ChannelHandlerContext ctx, Object msg) {
    context.executeFromIO(msg, message-> {
    conn.startRead();
    handleMessage(conn, message);
    });
    }
    }
    interface Context {
    ...
    void executeFromIo(Runnable handler);
    void executeFromIo(T msg, Consumer handler);
    }
    Non capturing
    lambda
    instantiated
    once

    View full-size slide

  35. class VertxHandler extends ChannelDuplexHandler {
    ...
    // Process Netty's messages
    void channelRead(ChannelHandlerContext ctx, Object msg) {
    context.executeFromIO(msg, handler);
    }
    private Handler handler = message -> {
    conn.startRead();
    handleMessage(conn, message);
    };
    }
    Lambda
    can become
    a field

    View full-size slide

  36. ✓ Minimize flushing

    View full-size slide

  37. ✓ Minimize flushing
    ✓ Optimise for the Just In Time
    compiler

    View full-size slide

  38. ✓ Minimize flushing
    ✓ Optimise for the Just In Time
    compiler
    ✓ Keep GC cool

    View full-size slide

  39. Database
    benchmarks

    View full-size slide

  40. Database benchmarks
    ✓ 4 benchmarks: db, queries, fortunes and updates
    ✓ MySQL, PostgreSQL or MongoDB
    ✓ 256 connections

    View full-size slide

  41. At round #14
    ✓ JDBC + IkariCP gives best performance in Java
    ✓ Vert.x uses JDBC with a worker pool
    ✓ Blocking is actually not an issue(???)

    View full-size slide

  42. Handling the problem
    ✓ Focus on PostgreSQL
    ✓ Bad results were actually due to mistakes
    - Missing MongoDB $id for index resulting in bad perf
    - No usage of a transaction in UPDATES causing
    abysmal results

    View full-size slide

  43. The reactive PostgreSQL client
    ✓ Goals
    - Simple, clean and straightforward API
    - Non blocking
    - Performance
    - Lightweight
    ✓ Non goals
    - A driver
    - An abstraction

    View full-size slide

  44. query
    result
    Round trips to PostgreSQL
    query
    result

    View full-size slide

  45. Pipelining to increase concurrency
    query
    result
    query
    result
    query
    result

    View full-size slide

  46. Running 5000 queries (100µs ping)
    Total time (seconds)
    0
    0,1
    0,2
    0,3
    0,4
    Pipelining level
    1 2 4 8 16
    JDBC Reactive client

    View full-size slide

  47. Running 5000 queries (1ms ping)
    Total time (seconds)
    0
    3,5
    7
    10,5
    14
    Pipelining level
    1 2 4 8 16
    JDBC Reactive client

    View full-size slide

  48. CI - Java - unofficial

    View full-size slide

  49. The reactive SQL client
    ✓ Part of Vert.x stack since 3.8 as SQL Client
    ✓ Support more database
    - PostgreSQL
    - MySQL
    - SQLServer soon

    View full-size slide

  50. Let there be pipelining

    View full-size slide

  51. What did we learn?
    ✓ TFB does not favour non-blocking designs
    ✓ JVM is a great place for performance
    ✓ Trade-offs between usability and performance
    ✓ RDBMS protocol design is a bottleneck
    ✓ Protocols concurrency matters

    View full-size slide

  52. ' TechEmpower Framework Benchmarks
    https: // www.techempower.com/benchmarks/
    ' Reactive PostgreSQL Client
    https: //github.com/eclipse-vertx/vertx-sql-client
    ' Async Profiler
    https: //github.com/jvm-profiling-tools/async-profiler
    ' Flame Graphs
    https: //github.com/brendangregg/FlameGraph
    ' Jitwatch
    https: //github.com/AdoptOpenJDK/jitwatch

    View full-size slide