Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Netty - One Framework to rule them all

Netty - One Framework to rule them all

Netty is one of the best known and most widely used (if not the most widely used) asynchronous network application frameworks for the JVM.
This talk will show you how Netty itself works and explain why some design choices were made. Beside this it will include war stories about
the many JVM-related challenges the Netty community has faced during Netty development and explain what action were taken to workaround these.

Reactive Summit 2016:
https://reactivesummit2016.sched.org/event/8B6M/netty-one-framework-to-rule-them-all

Norman Maurer

October 05, 2016
Tweet

More Decks by Norman Maurer

Other Decks in Programming

Transcript

  1. Netty
    One Framework to rule them all….

    View Slide

  2. Leading the Netty Project
    Apache Cassandra MVP 2016 - 2017
    Author of Netty in Action
    Apache Software Foundation
    Eclipse Foundation
    Norman Maurer

    View Slide

  3. A long journey

    View Slide

  4. Some background
    Netty 3.0.0.GA released in 2008
    Netty 4.0.0.Final released in 2013
    Netty 4.1.0.Final released in 2016
    one of the most used Network
    Framework for the JVM
    founded by Trustin Lee <3
    JBoss Project first, then independent
    very vibrant community

    View Slide


  5. Netty 3.x
    too much garbage
    too many memory copies
    no good memory pool included
    not optimized for Linux based OS
    threading model not easy to reason
    about
    Still it worked
    great! ….. Kind
    of at least.

    View Slide

  6. Netty 4.x now!
    create less garbage, less GC
    optimized for Linux based OS +
    Linux only features
    high performance buffer pool
    based on jemalloc paper
    well defined, easy to use
    threading model
    And there is more too come….
    Optimize all the
    things!

    View Slide

  7. ChannelPipeline
    Channel
    Channel
    Inbound
    Handler
    Channel
    Inbound
    Handler
    ChannelPipeline
    Channel
    Inbound
    Handler
    Channel
    Outbound
    Handler
    Channel
    Outbound
    Handler
    Inbound events -> ChannelInboundHandler
    Outbound events -> ChannelOutboundHandler
    OUT
    IN

    View Slide

  8. Combine
    handlers as UNIX
    commands via
    pipes
    • Interceptor pattern
    • allows to add building-blocks (ChannelHandler) on the fly
    that transforms data or react on events.
    ChannelPipeline

    $ echo "Netty is slow...." | sed -e 's/slow/fast/' | cat
    Netty is fast....

    View Slide

  9. Too much garbage
    Run collector ….run!

    View Slide

  10. Reduce Garbage
    eliminate GC by replace event objects with direct
    method invocations
    light-weight object pool for heavily allocated
    objects (like ByteBuf instances)
    Allocating an
    Object is often
    not the problem,
    collecting it is

    View Slide

  11. JNI to the rescue
    optimized transport for Linux only
    supports Linux specific features
    directly operate on pointers for buffers
    synchronization optimized for Netty’s threading
    model
    J
    N
    I C/C++
    Java

    View Slide

  12. Native Transport
    epoll based high-performance transport
    less GC pressure due less Objects
    advanced features
    SO_REUSEPORT
    TCP_CORK
    TCP_NOTSENT_LOWAT
    TCP_FASTOPEN
    TCP_INFO
    LT and ET
    Unix Domain Sockets
    Bootstrap bootstrap = new Bootstrap().group(
    new NioEventLoopGroup());
    bootstrap.channel(NioSocketChannel.class);
    Bootstrap bootstrap = new Bootstrap().group(
    new EpollEventLoopGroup());
    bootstrap.channel(EpollSocketChannel.class);
    NIO Transport
    Native Transport

    View Slide

  13. Buffers
    Performance vs Complexity

    View Slide

  14. ByteBuf
    ByteBufs are reference counted (huh!?!?)
    pooling is used by default
    provide LeakDetector which helps
    detecting ByteBuf leaks
    direct memory are used by default
    provide special abstractions to iterate
    over bytes to reduce branching / range-
    checks
    all buffers are dynamic and can grow
    Writing Java as it
    is C ?!?

    View Slide

  15. Buffer Pooling
    Allocations are expensive

    View Slide

  16. Allocation times
    NanoSeconds
    0
    1500
    3000
    4500
    6000
    Bytes
    0 256 1024 4096 16384 65536
    Unpooled Heap Pooled Heap Unpooled Direct Pooled Direct

    View Slide

  17. PooledByteBufAllocator
    based on jemalloc paper (3.x)
    ThreadLocal caches for lock-free
    allocation
    synchronize per Arena that holds the
    different chunks of memory
    different size classes
    reduce fragmentation
    ThreadLocal
    Cache 2
    Arena 1 Arena 2 Arena 3
    Size-classes Size-classes Size-classes
    Thread 2
    ThreadLocal
    Cache 1
    Thread 1

    View Slide

  18. Threading Model
    Writing multi-threaded applications is hard….

    View Slide

  19. IO-Thread
    Threading-Model
    all events / operations are done by the
    IO-Thread!
    eliminates the need of synchronization
    completly (as long as the handler is not
    shared!)
    writing single-threaded code FTW
    Channel
    Thread 1 Thread 2
    Channel
    Outbound
    Handler
    Channel
    Inbound
    Handler
    ChannelPipeline

    View Slide

  20. Write Semantics
    syscalls are expensive…

    View Slide

  21. Channel
    OutboundBuffer
    msg
    Write Semantics
    Channel.write(…) will only put messages
    in the ChannelOutboundBuffer once
    processed.
    Channel.flush() will flush everything in
    the ChannelOutboundBuffer and so call
    writev(…).
    Channel.write(…)
    writev(…)
    Java
    JNI / C
    Channel.flush()
    msg
    msg
    msg

    View Slide

  22. Read Semantics
    Fine grained control FTW

    View Slide

  23. Read Semantics
    ChannelConfig.setAutoRead(boolean) to the
    rescue.
    ChannelConfig.setMaxMessagesPerRead(int)
    allows to limit max number of messages to
    read.
    Channel.read() allows to explicit trigger a
    read.
    RecvByteBufAllocator gives even more
    flexibility
    while (i < messagesPerRead) {
    read(…);
    }

    View Slide

  24. IO - Threads
    Never-ever block the IO-Thread!

    View Slide

  25. EventLoop(Group)
    IO Thread abstracted as EventLoop
    easily share the same EventLoop between
    Server and Client
    be able to explicitly use same EventLoop
    for accepted connection and outbound
    connection (win for proxy applications!)
    Bonus: EventLoop is also a
    ScheduledEventExecutor
    for (;;) {
    waitForEventsOrTasks();
    processEvents();
    processTasks();
    processScheduledTasks();
    }

    View Slide

  26. Work outside the IO-Thread
    sometimes you need to block

    View Slide

  27. EventExecutor(Group)
    part of the core itself
    adding ChannelHandler with an
    EventExecutorGroup will get the job done
    different EventExecutorGroup
    implementations for serial / non-serial
    executions.
    supports moving work to other EventLoop
    ChannelPipeline pipeline = …;
    pipeline.addLast(executorGroup,
    new ExecutionHandler(…));

    View Slide

  28. JNI based SSLEngine
    … to the rescue
    J
    N
    I C/C++
    Java

    View Slide

  29. SSLEngine implementations
    Requests / Sec
    OpenSslEngine
    SSLEngineImpl
    0 150000 300000 450000 600000
    Transfer(MB) / Sec
    OpenSslEngine
    SSLEngineImpl
    0 17,5 35 52,5 70

    View Slide

  30. SSLEngine implementations
    OpenSslEngine SSLEngineImpl
    VS

    View Slide

  31. OpenSslEngine
    drop in replacement for JDK SSLEngine
    (SSLEngineImpl)
    gives you up to 6 x performance
    less memory usage
    less GC
    SslContextBuilder.forServer()
    .sslProvider(
    SslProvider.OpenSsl);

    View Slide

  32. Netty and the JVM
    A Hate-Love-Relationship

    View Slide

  33. Direct memory management
    the whole idea of managing direct memory with via the Garbage-Collector is
    fundamentally broken
    static synchronized in allocation and deallocation methods of direct memory
    there is also Thread.sleep(100) and System.gc() ?!?
    Now you made
    me cry…

    View Slide

  34. Memory Layout - ENOCONTROL
    no easy way to control over memory layout (all these hacks ….)
    false-sharing is a real issue on own data-structures
    @Contended does not help at all in practice
    Gimme more
    control now!

    View Slide

  35. JNI
    nasty “hacks” needed to be able to get good performance
    includes things like writing structs directly via sun.misc.Unsafe (no joke!)
    calling from JNI into Java methods is SUPER-expensive
    J
    N
    I C/C++
    Java

    View Slide

  36. NIO / IO and others
    NIO.2 no real improvement over NIO
    too much garbage produced and so GC overhead
    ByteBuffer API is not user-friendly (flip all the things!)
    IOException / ConnectException are too generic and not useful
    creating String from byte[] / char[] not possible without memory copy
    java.util.concurrent.Future was (and still is) a disaster

    View Slide

  37. Get my book…
    Ka-ching!

    View Slide

  38. Questions?

    View Slide

  39. Thanks!

    View Slide