High Performance Networking on the JVM - Lessons learned

HIGH PERFORMANCE NETWORKING ON THE JVM LESSONS LEARNED

NORMAN MAURER Red Hat (JBoss) - EAP Core Team Former
contractor for Apple Inc Author - Netty in Action Apache Software Foundation Member Netty / Vert.x Core-Developer and all things NIO Java and Scala Twitter: @normanmaurer Github: https://github.com/normanmaurer

GENERAL As always, only optimize if you really need! 1000
concurrent connections != high-scale If you only need to handle a few hundred connections use Blocking IO! Make use of a profiler to find issues, and not best-guess... Always test before and after changes, and not forget to warmup!

WHAT YOU WANT “For high-performance with many concurrent connections you
WANT to use NIO or NIO.2”

WHAT YOU NOT WANT “Create one Thread per connection and
let the OS try to deal with thousands of threads” “If you want.... good luck ;)”

SOCKET OPTIONS GENERAL Some socket options can have great impact
This is true for bad and good impact Only touch them if you know what they do

MOST INTERESTING SOCKET OPTIONS TCP_NO_DELAY SO_SNDBUF SO_RCVBUF

GC - PRESSURE “Allocate / Deallocate the shit out of
it!”

SOLVE GC-PRESSURE Try to minimize allocation / deallocation of objects
Use static instances where ever possible Ask yourself do I really need to create the instance BUT, only cache/pool where it makes sense as long-living objects may have bad impact on GC as well “Rule of thumb: use static if it's immutable and used often. If its mutable only pool / cache if allocation costs are high!”

GC-PRESSURE “Every time I hear allocation / deallocation of objects
is a no-brainer a kitten dies!”

GC - PRESSURE “But I never had GC-Pressure ....” “
Well, you not pushed your system hard enough! ”

SOURCE OF GC-PRESSURE IN ACTION https://github.com/netty/netty/issues/973 BAD BETTER! c h
a n n e l I d l e ( c t x , n e w I d l e S t a t e E v e n t ( I d l e S t a t e . R E A D E R _ I D L E , r e a d e r I d l e C o u n t + + , c u r r e n t T i m e - l a s t R e a d T i m e ) ) ; c h a n n e l I d l e ( c t x , I d l e S t a t e E v e n t . R E A D E R _ I D L E _ E V E N T ) ;

GARBAGE-COLLECTOR MATTERS The Garbage-Collector really matters Use a CMS-based collector
or G1 if you want high- troughput Size different areas depending on your application / access pattern “Stop-the-world GC is your worst enemy if you want to push data hard”

GARBAGE COLLECTOR Tune the GC is kind of an "black
art" Be sure you understand what you are doing GC-Tuning params are different per App

BUFFERS Allocate / Deallocate from direct buffers is expensive Allocate/
Deallocate from heap buffers is cheap “Free up memory of direct buffers is expensive” “Unfortunately zero-out the byte array of heap buffers is not for free too”

BUFFERPOOLING TO THE RESCUE Pool buffers if you need to
create a lot of them This is especially true for direct buffers There is also Unsafe.... but its " unsafe " ;)

MEMORY FRAGMENTATION Memory fragmentation is bad, as you will waste
memory More often GC to remove fragmentation. “Can't insert int here as we need 4 slots!”

GATHERING WRITES / SCATTERING READS Use Gathering writes / Scattering
reads Especially useful for protocols that can be assembled out of multiple buffers IMPORTANT: Gathering writes only works without memory leak since java7 and late java6.

USE DIRECT BUFFERS FOR SOCKETS Use direct buffers when you
do operations on sockets “WHY ?” “Internally the JDK* will copy the buffer content to a direct buffer if you not use one”

MINIMIZE SYSCALLS Also true for other operations that directly hit
the OS Batch up things, but as always there is a tradeoff. “Only call Channel.write(...) / Channel.read(...) if you really need!”

MEMORY COPIES ARE NOT FOR FREE USE THEM! “ByteBuffer expose
operations like slice(), duplicate() for a good reason”

ZERO-MEMORY-COPY A.K.A FILECHANNEL Many Operation Systems support it Helps to
write File content to a Channel in an efficient way “Only possible if you not need to transform the data during transfer!”

THROTTLE READS / WRITES /ACCEPTS Otherwise you will have fun
with OOM interestedOps(..) update to the rescue! This will push the "burden" to the network stack https://github.com/netty/netty/issues/1024 “But not call interestedOps(...) too often, its expensive!”

DON'T REGISTER FOR OP_WRITE Don't register for OP_WRITE on the
Selector by default Only do if you could not write the complete buffer Remove OP_WRITE from interestedOps() after you was able to write “Remember most of the times the Channel is writable!”

DON'T BLOCK! Don't block the Thread that handles the IO
You may be surprised what can block “I look at you DNS resolution!” “If you really need to block move it to an extra ThreadPool”

BLOCKING IN ACTION “RED != Good!”

SELECTIONKEY OPERATIONS SelectionKey.interestedOps(....); “This method may be invoked at any
time. Whether or not it blocks, and for how long is implementation-dependent”

OPTIMIZE BAD BETTER! p u b l i c v
o i d s u s p e n d R e a d ( ) { k e y . i n t e r e s t O p s ( k e y . i n t e r e s t O p s ( ) & ~ O P _ R E A D ) ; } p u b l i c v o i d s u s p e n d R e a d ( ) { i n t o p s = k e y . i n t e r e s t O p s ( ) ; i f ( ( o p s & O P _ R E A D ) ! = 0 ) { k e y . i n t e r e s t O p s ( o p s & ~ O P _ R E A D ) ; } }

BE MEMORY EFFICIENT “When write a System that handles 100k
of concurrent connections every saved memory count for long-living objects”

MEMORY EFFICIENT ATOMIC AtomicReference => AtomicReferenceFieldUpdater AtomicBoolean => AtomicIntegerFieldUpdater AtomicLong
=> AtomicIntegerFieldUpdater AtomicInteger=> AtomicIntegerFieldUpdater “It's ugly, but sometimes you just have to do it!”

DATA-STRUCTURE MATTERS Think about what data-structure fits best Linked vs.
Array based Access pattern ?!?

VOLATILE Volatile reads are cheap.... But still not for free
Cache volatile variables if possible to minimize flushes

OPTIMIZE BAD BETTER p r i v a t e
v o l a t i l e S e l e c t o r s e l e c t o r ; p u b l i c v o i d m e t h o d ( ) . . . . { s e l e c t o r . s e l e c t ( ) ; . . . . } p r i v a t e v o l a t i l e S e l e c t o r s e l e c t o r ; p u b l i c v o i d m e t h o d ( ) . . . . { S e l e c t o r s e l e c t o r = t h i s . s e l e c t o r ; s e l e c t o r . s e l e c t ( ) ; . . . . }

MINIMIZE STACKDEPTH Deep stacks are our enemies, because they are
expensive Use tail-recursive calls if possible “WHY?” “Everything that needs to be stored till the call is done needs memory...”

USE JDK7 IF POSSIBLE Also has some other goodies like:
NIO.2 , UDP Multicast, SCTP “Allocation / Deallocation of ByteBuffers is a lot faster now...”

WELL DEFINED THREAD-MODEL It makes development easier Reduce context-switching Reduce
the need for synchronization in many cases

CHOOSE THE CORRECT PROTOCOL UDP TCP SCTP UDT “It's always
a trade-off!”

PIPELINING IS AWESOME Allow to send / receive more then
one message before response This minimize send / receive operations Popular protocols which support Pipelining: HTTP, SMTP, IMAP “ If you write your own protocol think about Pipelining! ”

DON'T WANT TO HASSLE Netty Vert.x Xnio Grizzly Apache Mina
“There are a few frameworks to rescue....”

WANT TO LEARN MORE “Attend my talk about Netty 4
tomorrow ;)”

QUESTIONS?

THANKS

High Performance Networking on the JVM - Lesson...

High Performance Networking on the JVM - Lessons learned

More Decks by Norman Maurer

Other Decks in Programming

Featured

Transcript