High Performance Networking on the JVM - Lessons learned

High Performance Networking on the JVM - Lessons learned

This presentation was hold as part of JAX 2013 in Mainz

E521627c18ed3feaf9db41e754a79483?s=128

Norman Maurer

April 23, 2013
Tweet

Transcript

  1. 2.

    NORMAN MAURER Red Hat (JBoss) - EAP Core Team Former

    contractor for Apple Inc Author - Netty in Action Apache Software Foundation Member Netty / Vert.x Core-Developer and all things NIO Java and Scala Twitter: @normanmaurer Github: https://github.com/normanmaurer
  2. 3.

    GENERAL As always, only optimize if you really need! 1000

    concurrent connections != high-scale If you only need to handle a few hundred connections use Blocking IO! Make use of a profiler to find issues, and not best-guess... Always test before and after changes, and not forget to warmup!
  3. 5.

    WHAT YOU NOT WANT “Create one Thread per connection and

    let the OS try to deal with thousands of threads” “If you want.... good luck ;)”
  4. 6.

    SOCKET OPTIONS GENERAL Some socket options can have great impact

    This is true for bad and good impact Only touch them if you know what they do
  5. 9.

    SOLVE GC-PRESSURE Try to minimize allocation / deallocation of objects

    Use static instances where ever possible Ask yourself do I really need to create the instance BUT, only cache/pool where it makes sense as long-living objects may have bad impact on GC as well “Rule of thumb: use static if it's immutable and used often. If its mutable only pool / cache if allocation costs are high!”
  6. 11.

    GC - PRESSURE “But I never had GC-Pressure ....” “

    Well, you not pushed your system hard enough! ”
  7. 12.

    SOURCE OF GC-PRESSURE IN ACTION https://github.com/netty/netty/issues/973 BAD BETTER! c h

    a n n e l I d l e ( c t x , n e w I d l e S t a t e E v e n t ( I d l e S t a t e . R E A D E R _ I D L E , r e a d e r I d l e C o u n t + + , c u r r e n t T i m e - l a s t R e a d T i m e ) ) ; c h a n n e l I d l e ( c t x , I d l e S t a t e E v e n t . R E A D E R _ I D L E _ E V E N T ) ;
  8. 13.

    GARBAGE-COLLECTOR MATTERS The Garbage-Collector really matters Use a CMS-based collector

    or G1 if you want high- troughput Size different areas depending on your application / access pattern “Stop-the-world GC is your worst enemy if you want to push data hard”
  9. 14.

    GARBAGE COLLECTOR Tune the GC is kind of an "black

    art" Be sure you understand what you are doing GC-Tuning params are different per App
  10. 15.

    BUFFERS Allocate / Deallocate from direct buffers is expensive Allocate/

    Deallocate from heap buffers is cheap “Free up memory of direct buffers is expensive” “Unfortunately zero-out the byte array of heap buffers is not for free too”
  11. 16.

    BUFFERPOOLING TO THE RESCUE Pool buffers if you need to

    create a lot of them This is especially true for direct buffers There is also Unsafe.... but its " unsafe " ;)
  12. 17.

    MEMORY FRAGMENTATION Memory fragmentation is bad, as you will waste

    memory More often GC to remove fragmentation. “Can't insert int here as we need 4 slots!”
  13. 18.

    GATHERING WRITES / SCATTERING READS Use Gathering writes / Scattering

    reads Especially useful for protocols that can be assembled out of multiple buffers IMPORTANT: Gathering writes only works without memory leak since java7 and late java6.
  14. 19.

    USE DIRECT BUFFERS FOR SOCKETS Use direct buffers when you

    do operations on sockets “WHY ?” “Internally the JDK* will copy the buffer content to a direct buffer if you not use one”
  15. 20.

    MINIMIZE SYSCALLS Also true for other operations that directly hit

    the OS Batch up things, but as always there is a tradeoff. “Only call Channel.write(...) / Channel.read(...) if you really need!”
  16. 21.

    MEMORY COPIES ARE NOT FOR FREE USE THEM! “ByteBuffer expose

    operations like slice(), duplicate() for a good reason”
  17. 22.

    ZERO-MEMORY-COPY A.K.A FILECHANNEL Many Operation Systems support it Helps to

    write File content to a Channel in an efficient way “Only possible if you not need to transform the data during transfer!”
  18. 23.

    THROTTLE READS / WRITES /ACCEPTS Otherwise you will have fun

    with OOM interestedOps(..) update to the rescue! This will push the "burden" to the network stack https://github.com/netty/netty/issues/1024 “But not call interestedOps(...) too often, its expensive!”
  19. 24.

    DON'T REGISTER FOR OP_WRITE Don't register for OP_WRITE on the

    Selector by default Only do if you could not write the complete buffer Remove OP_WRITE from interestedOps() after you was able to write “Remember most of the times the Channel is writable!”
  20. 25.

    DON'T BLOCK! Don't block the Thread that handles the IO

    You may be surprised what can block “I look at you DNS resolution!” “If you really need to block move it to an extra ThreadPool”
  21. 27.

    SELECTIONKEY OPERATIONS SelectionKey.interestedOps(....); “This method may be invoked at any

    time. Whether or not it blocks, and for how long is implementation-dependent”
  22. 28.

    OPTIMIZE BAD BETTER! p u b l i c v

    o i d s u s p e n d R e a d ( ) { k e y . i n t e r e s t O p s ( k e y . i n t e r e s t O p s ( ) & ~ O P _ R E A D ) ; } p u b l i c v o i d s u s p e n d R e a d ( ) { i n t o p s = k e y . i n t e r e s t O p s ( ) ; i f ( ( o p s & O P _ R E A D ) ! = 0 ) { k e y . i n t e r e s t O p s ( o p s & ~ O P _ R E A D ) ; } }
  23. 29.

    BE MEMORY EFFICIENT “When write a System that handles 100k

    of concurrent connections every saved memory count for long-living objects”
  24. 30.

    MEMORY EFFICIENT ATOMIC AtomicReference => AtomicReferenceFieldUpdater AtomicBoolean => AtomicIntegerFieldUpdater AtomicLong

    => AtomicIntegerFieldUpdater AtomicInteger=> AtomicIntegerFieldUpdater “It's ugly, but sometimes you just have to do it!”
  25. 32.

    VOLATILE Volatile reads are cheap.... But still not for free

    Cache volatile variables if possible to minimize flushes
  26. 33.

    OPTIMIZE BAD BETTER p r i v a t e

    v o l a t i l e S e l e c t o r s e l e c t o r ; p u b l i c v o i d m e t h o d ( ) . . . . { s e l e c t o r . s e l e c t ( ) ; . . . . } p r i v a t e v o l a t i l e S e l e c t o r s e l e c t o r ; p u b l i c v o i d m e t h o d ( ) . . . . { S e l e c t o r s e l e c t o r = t h i s . s e l e c t o r ; s e l e c t o r . s e l e c t ( ) ; . . . . }
  27. 34.

    MINIMIZE STACKDEPTH Deep stacks are our enemies, because they are

    expensive Use tail-recursive calls if possible “WHY?” “Everything that needs to be stored till the call is done needs memory...”
  28. 35.

    USE JDK7 IF POSSIBLE Also has some other goodies like:

    NIO.2 , UDP Multicast, SCTP “Allocation / Deallocation of ByteBuffers is a lot faster now...”
  29. 38.

    PIPELINING IS AWESOME Allow to send / receive more then

    one message before response This minimize send / receive operations Popular protocols which support Pipelining: HTTP, SMTP, IMAP “ If you write your own protocol think about Pipelining! ”
  30. 39.

    DON'T WANT TO HASSLE Netty Vert.x Xnio Grizzly Apache Mina

    “There are a few frameworks to rescue....”
  31. 42.