Slide 1

Slide 1 text

Real world HTTP benchmarking Lessons learned Julien Viet

Slide 2

Slide 2 text

Julien Viet Open source developer for 16+ years @vertx_project lead Principal software engineer at Marseille JUG Leader ! https://www.julienviet.com/ " http://github.com/vietj # @julienviet $ https://www.mixcloud.com/cooperdbi/

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

✓ 464 frameworks / 26 languages ✓ 5 tests ✓ Strict requirements ✓ Physical server or cloud ✓ Continuous benchmarking Framework Benchmark https://dzone.com/articles/five-facts-you-might-not-know-about-techempower-fr

Slide 5

Slide 5 text

10K ⋆ on " Powered by ! https://vertx.io # @vertx_project A toolkit for building reactive applications in Java

Slide 6

Slide 6 text

5 free ebook codes to win NOW!!! Tweet and mention @vertx_project Get 50% off with tsvertx

Slide 7

Slide 7 text

Round #8 (2013)

Slide 8

Slide 8 text

Round #14 (2017)

Slide 9

Slide 9 text

Round #14

Slide 10

Slide 10 text

Round #14

Slide 11

Slide 11 text

Round #14

Slide 12

Slide 12 text

Round #14

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

✓ Benchmarking is not a simulation

Slide 15

Slide 15 text

✓ Benchmarking is not a simulation ✓ Measure don't guess

Slide 16

Slide 16 text

✓ Benchmarking is not a simulation ✓ Measure don't guess ✓ Use a baseline

Slide 17

Slide 17 text

✓ Benchmarking is not a simulation ✓ Measure don't guess ✓ Use a baseline ✓ Define expectations

Slide 18

Slide 18 text

Tools of the trade async-profiler perf/dtrace JVM logs Flame Graphs jitwatch

Slide 19

Slide 19 text

Plaintext benchmark

Slide 20

Slide 20 text

Plaintext benchmark ✓ Synchronous static Hello World response ✓ HTTP pipelining: 16 ✓ best of (256,...,16384) connections

Slide 21

Slide 21 text

Batch flushes appropriately ○ to amortise costs

Slide 22

Slide 22 text

GET OK GET OK GET OK GET OK OK OK GET GET keep-alive pipelining

Slide 23

Slide 23 text

Default pipelining throughput ✓ Pipelining 1: 59,782 ✓ Pipelining 2: 74,195 ✓ Pipelining 4: 79,037 ✓ Pipelining 8: 82,122

Slide 24

Slide 24 text

GET OK OK OK GET GET immediate flush batched flushes GET OK OK OK GET GET

Slide 25

Slide 25 text

Optimised pipelining requests/second ✓ Pipelining 1: 59,782 ○ 57,123 ✓ Pipelining 2: 74,195 ○ 97,329 ✓ Pipelining 4: 79,037 ○ 217,110 ✓ Pipelining 8: 82,122 ○ 293,381

Slide 26

Slide 26 text

Keep your methods small ○ to ease method inlining

Slide 27

Slide 27 text

3% penalty? ad591ec985bcc9f99be173c2ce1c18e350a662f2

Slide 28

Slide 28 text

Just In Time compilation ✓ Translate Java bytecode to native code ✓ Optimise only for the hot path ✓ Kinds of optimisations: method inlining, loop hoisting, dead code elimination, etc...

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

process error

Slide 31

Slide 31 text

process request process error

Slide 32

Slide 32 text

process request process body process error

Slide 33

Slide 33 text

handleError(request)

Slide 34

Slide 34 text

handleContent((HttpContent) msg) handleError(request)

Slide 35

Slide 35 text

reduce method size to favour inlining handleContent((HttpContent) msg) handleError(request)

Slide 36

Slide 36 text

inlining manually handleContent((HttpContent) msg) handleError(request)

Slide 37

Slide 37 text

Avoid unnecessary allocation ○ to reduce GC pressure

Slide 38

Slide 38 text

class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(() -> { conn.startRead(); handleMessage(conn, msg); }); } } interface Context { ... void executeFromIo(Runnable handler); ... }

Slide 39

Slide 39 text

class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(() -> { conn.startRead(); handleMessage(conn, msg); }); } } interface Context { ... void executeFromIo(Runnable handler); ... } Instantiate the lambda for each call

Slide 40

Slide 40 text

class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(() -> { conn.startRead(); handleMessage(conn, msg); }); } } interface Context { ... void executeFromIo(Runnable handler); void executeFromIo(T msg, Consumer handler); }

Slide 41

Slide 41 text

class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(msg, message-> { conn.startRead(); handleMessage(conn, message); }); } } interface Context { ... void executeFromIo(Runnable handler); void executeFromIo(T msg, Consumer handler); } Non capturing lambda instantiated once

Slide 42

Slide 42 text

class VertxHandler extends ChannelDuplexHandler { ... // Process Netty's messages void channelRead(ChannelHandlerContext ctx, Object msg) { context.executeFromIO(msg, handler); } private Handler handler = message -> { conn.startRead(); handleMessage(conn, message); }; } Lambda can become a field

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

✓ Minimize flushing

Slide 45

Slide 45 text

✓ Minimize flushing ✓ Optimise for the Just In Time compiler

Slide 46

Slide 46 text

✓ Minimize flushing ✓ Optimise for the Just In Time compiler ✓ Keep GC cool

Slide 47

Slide 47 text

Round #15

Slide 48

Slide 48 text

Database benchmarks

Slide 49

Slide 49 text

Database benchmarks ✓ 4 benchmarks: db, queries, fortunes and updates ✓ MySQL, PostgreSQL or MongoDB ✓ 256 connections

Slide 50

Slide 50 text

At round #14 ✓ JDBC + IkariCP gives best performance in Java ✓ Vert.x uses JDBC with a worker pool ✓ Blocking is actually not an issue(???)

Slide 51

Slide 51 text

Handling the problem ✓ Focus on PostgreSQL ✓ Bad results were actually due to mistakes - Missing MongoDB $id for index resulting in bad perf - No usage of a transaction in UPDATES causing abysmal results

Slide 52

Slide 52 text

The reactive PostgreSQL client ✓ Goals - Simple, clean and straightforward API - Non blocking - Performance - Lightweight ✓ Non goals - A driver - An abstraction

Slide 53

Slide 53 text

query result Round trips to PostgreSQL query result

Slide 54

Slide 54 text

Pipelining to increase concurrency query result query result query result

Slide 55

Slide 55 text

Running 5000 queries (100µs ping) Total time (seconds) 0 0,1 0,2 0,3 0,4 Pipelining level 1 2 4 8 16 JDBC Reactive client

Slide 56

Slide 56 text

Running 5000 queries (1ms ping) Total time (seconds) 0 3,5 7 10,5 14 Pipelining level 1 2 4 8 16 JDBC Reactive client

Slide 57

Slide 57 text

Round #15

Slide 58

Slide 58 text

CI - Java - unofficial

Slide 59

Slide 59 text

The reactive SQL client ✓ Part of Vert.x stack since 3.8 as SQL Client ✓ Support more database - PostgreSQL - MySQL - SQLServer soon

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

Let there be pipelining

Slide 68

Slide 68 text

What did we learn? ✓ TFB does not favour non-blocking designs ✓ JVM is a great place for performance ✓ Trade-offs between usability and performance ✓ RDBMS protocol design is a bottleneck ✓ Protocols concurrency matters

Slide 69

Slide 69 text

' TechEmpower Framework Benchmarks https: // www.techempower.com/benchmarks/ ' Reactive PostgreSQL Client https: //github.com/eclipse-vertx/vertx-sql-client ' Async Profiler https: //github.com/jvm-profiling-tools/async-profiler ' Flame Graphs https: //github.com/brendangregg/FlameGraph ' Jitwatch https: //github.com/AdoptOpenJDK/jitwatch