Netty - One Framework to rule them all

Netty - One Framework to rule them all

Netty is one of the best known and most widely used (if not the most widely used) asynchronous network application frameworks for the JVM.
This talk will show you how Netty itself works and explain why some design choices were made. Beside this it will include war stories about
the many JVM-related challenges the Netty community has faced during Netty development and explain what action were taken to workaround these.

Reactive Summit 2016:
https://reactivesummit2016.sched.org/event/8B6M/netty-one-framework-to-rule-them-all

E521627c18ed3feaf9db41e754a79483?s=128

Norman Maurer

October 05, 2016
Tweet

Transcript

  1. Netty One Framework to rule them all….

  2. Leading the Netty Project Apache Cassandra MVP 2016 - 2017

    Author of Netty in Action Apache Software Foundation Eclipse Foundation Norman Maurer
  3. A long journey

  4. Some background Netty 3.0.0.GA released in 2008 Netty 4.0.0.Final released

    in 2013 Netty 4.1.0.Final released in 2016 one of the most used Network Framework for the JVM founded by Trustin Lee <3 JBoss Project first, then independent very vibrant community
  5. Netty 3.x too much garbage too many memory copies no

    good memory pool included not optimized for Linux based OS threading model not easy to reason about Still it worked great! ….. Kind of at least.
  6. Netty 4.x now! create less garbage, less GC optimized for

    Linux based OS + Linux only features high performance buffer pool based on jemalloc paper well defined, easy to use threading model And there is more too come…. Optimize all the things!
  7. ChannelPipeline Channel Channel Inbound Handler Channel Inbound Handler ChannelPipeline Channel

    Inbound Handler Channel Outbound Handler Channel Outbound Handler Inbound events -> ChannelInboundHandler Outbound events -> ChannelOutboundHandler OUT IN
  8. Combine handlers as UNIX commands via pipes • Interceptor pattern

    • allows to add building-blocks (ChannelHandler) on the fly that transforms data or react on events. ChannelPipeline $ echo "Netty is slow...." | sed -e 's/slow/fast/' | cat Netty is fast....
  9. Too much garbage Run collector ….run!

  10. Reduce Garbage eliminate GC by replace event objects with direct

    method invocations light-weight object pool for heavily allocated objects (like ByteBuf instances) Allocating an Object is often not the problem, collecting it is
  11. JNI to the rescue optimized transport for Linux only supports

    Linux specific features directly operate on pointers for buffers synchronization optimized for Netty’s threading model J N I C/C++ Java
  12. Native Transport epoll based high-performance transport less GC pressure due

    less Objects advanced features SO_REUSEPORT TCP_CORK TCP_NOTSENT_LOWAT TCP_FASTOPEN TCP_INFO LT and ET Unix Domain Sockets Bootstrap bootstrap = new Bootstrap().group( new NioEventLoopGroup()); bootstrap.channel(NioSocketChannel.class); Bootstrap bootstrap = new Bootstrap().group( new EpollEventLoopGroup()); bootstrap.channel(EpollSocketChannel.class); NIO Transport Native Transport
  13. Buffers Performance vs Complexity

  14. ByteBuf ByteBufs are reference counted (huh!?!?) pooling is used by

    default provide LeakDetector which helps detecting ByteBuf leaks direct memory are used by default provide special abstractions to iterate over bytes to reduce branching / range- checks all buffers are dynamic and can grow Writing Java as it is C ?!?
  15. Buffer Pooling Allocations are expensive

  16. Allocation times NanoSeconds 0 1500 3000 4500 6000 Bytes 0

    256 1024 4096 16384 65536 Unpooled Heap Pooled Heap Unpooled Direct Pooled Direct
  17. PooledByteBufAllocator based on jemalloc paper (3.x) ThreadLocal caches for lock-free

    allocation synchronize per Arena that holds the different chunks of memory different size classes reduce fragmentation ThreadLocal Cache 2 Arena 1 Arena 2 Arena 3 Size-classes Size-classes Size-classes Thread 2 ThreadLocal Cache 1 Thread 1
  18. Threading Model Writing multi-threaded applications is hard….

  19. IO-Thread Threading-Model all events / operations are done by the

    IO-Thread! eliminates the need of synchronization completly (as long as the handler is not shared!) writing single-threaded code FTW Channel Thread 1 Thread 2 Channel Outbound Handler Channel Inbound Handler ChannelPipeline
  20. Write Semantics syscalls are expensive…

  21. Channel OutboundBuffer msg Write Semantics Channel.write(…) will only put messages

    in the ChannelOutboundBuffer once processed. Channel.flush() will flush everything in the ChannelOutboundBuffer and so call writev(…). Channel.write(…) writev(…) Java JNI / C Channel.flush() msg msg msg
  22. Read Semantics Fine grained control FTW

  23. Read Semantics ChannelConfig.setAutoRead(boolean) to the rescue. ChannelConfig.setMaxMessagesPerRead(int) allows to limit

    max number of messages to read. Channel.read() allows to explicit trigger a read. RecvByteBufAllocator gives even more flexibility while (i < messagesPerRead) { read(…); }
  24. IO - Threads Never-ever block the IO-Thread!

  25. EventLoop(Group) IO Thread abstracted as EventLoop easily share the same

    EventLoop between Server and Client be able to explicitly use same EventLoop for accepted connection and outbound connection (win for proxy applications!) Bonus: EventLoop is also a ScheduledEventExecutor for (;;) { waitForEventsOrTasks(); processEvents(); processTasks(); processScheduledTasks(); }
  26. Work outside the IO-Thread sometimes you need to block

  27. EventExecutor(Group) part of the core itself adding ChannelHandler with an

    EventExecutorGroup will get the job done different EventExecutorGroup implementations for serial / non-serial executions. supports moving work to other EventLoop ChannelPipeline pipeline = …; pipeline.addLast(executorGroup, new ExecutionHandler(…));
  28. JNI based SSLEngine … to the rescue J N I

    C/C++ Java
  29. SSLEngine implementations Requests / Sec OpenSslEngine SSLEngineImpl 0 150000 300000

    450000 600000 Transfer(MB) / Sec OpenSslEngine SSLEngineImpl 0 17,5 35 52,5 70
  30. SSLEngine implementations OpenSslEngine SSLEngineImpl VS

  31. OpenSslEngine drop in replacement for JDK SSLEngine (SSLEngineImpl) gives you

    up to 6 x performance less memory usage less GC SslContextBuilder.forServer() .sslProvider( SslProvider.OpenSsl);
  32. Netty and the JVM A Hate-Love-Relationship

  33. Direct memory management the whole idea of managing direct memory

    with via the Garbage-Collector is fundamentally broken static synchronized in allocation and deallocation methods of direct memory there is also Thread.sleep(100) and System.gc() ?!? Now you made me cry…
  34. Memory Layout - ENOCONTROL no easy way to control over

    memory layout (all these hacks ….) false-sharing is a real issue on own data-structures @Contended does not help at all in practice Gimme more control now!
  35. JNI nasty “hacks” needed to be able to get good

    performance includes things like writing structs directly via sun.misc.Unsafe (no joke!) calling from JNI into Java methods is SUPER-expensive J N I C/C++ Java
  36. NIO / IO and others NIO.2 no real improvement over

    NIO too much garbage produced and so GC overhead ByteBuffer API is not user-friendly (flip all the things!) IOException / ConnectException are too generic and not useful creating String from byte[] / char[] not possible without memory copy java.util.concurrent.Future was (and still is) a disaster
  37. Get my book… Ka-ching!

  38. Questions?

  39. Thanks!