Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Performance Networking on the JVM - Lessons learned

High Performance Networking on the JVM - Lessons learned

This presentation was hold as part of JAX 2013 in Mainz

Norman Maurer

April 23, 2013
Tweet

More Decks by Norman Maurer

Other Decks in Programming

Transcript

  1. HIGH PERFORMANCE
    NETWORKING ON THE JVM
    LESSONS LEARNED

    View Slide

  2. NORMAN MAURER
    Red Hat (JBoss) - EAP Core Team
    Former contractor for Apple Inc
    Author - Netty in Action
    Apache Software Foundation Member
    Netty / Vert.x Core-Developer and all things NIO
    Java and Scala
    Twitter: @normanmaurer
    Github: https://github.com/normanmaurer

    View Slide

  3. GENERAL
    As always, only optimize if you really need!
    1000 concurrent connections != high-scale
    If you only need to handle a few hundred connections use
    Blocking IO!
    Make use of a profiler to find issues, and not best-guess...
    Always test before and after changes, and not forget to
    warmup!

    View Slide

  4. WHAT YOU WANT
    “For high-performance with many concurrent
    connections you WANT to use NIO or NIO.2”

    View Slide

  5. WHAT YOU NOT WANT
    “Create one Thread per connection and let the
    OS try to deal with thousands of threads”
    “If you want.... good luck ;)”

    View Slide

  6. SOCKET OPTIONS GENERAL
    Some socket options can have great impact
    This is true for bad and good impact
    Only touch them if you know what they do

    View Slide

  7. MOST INTERESTING SOCKET OPTIONS
    TCP_NO_DELAY
    SO_SNDBUF
    SO_RCVBUF

    View Slide

  8. GC - PRESSURE
    “Allocate / Deallocate the shit out of it!”

    View Slide

  9. SOLVE GC-PRESSURE
    Try to minimize allocation / deallocation of objects
    Use static instances where ever possible
    Ask yourself do I really need to create the instance
    BUT, only cache/pool where it makes sense as long-living
    objects may have bad impact on GC as well
    “Rule of thumb: use static if it's immutable
    and used often. If its mutable only pool /
    cache if allocation costs are high!”

    View Slide

  10. GC-PRESSURE
    “Every time I hear allocation / deallocation of
    objects is a no-brainer a kitten dies!”

    View Slide

  11. GC - PRESSURE
    “But I never had GC-Pressure ....”
    “ Well, you not pushed your system hard
    enough! ”

    View Slide

  12. SOURCE OF GC-PRESSURE IN ACTION
    https://github.com/netty/netty/issues/973
    BAD
    BETTER!
    c
    h
    a
    n
    n
    e
    l
    I
    d
    l
    e
    (
    c
    t
    x
    , n
    e
    w I
    d
    l
    e
    S
    t
    a
    t
    e
    E
    v
    e
    n
    t
    (
    I
    d
    l
    e
    S
    t
    a
    t
    e
    .
    R
    E
    A
    D
    E
    R
    _
    I
    D
    L
    E
    ,
    r
    e
    a
    d
    e
    r
    I
    d
    l
    e
    C
    o
    u
    n
    t +
    +
    , c
    u
    r
    r
    e
    n
    t
    T
    i
    m
    e - l
    a
    s
    t
    R
    e
    a
    d
    T
    i
    m
    e
    )
    )
    ;
    c
    h
    a
    n
    n
    e
    l
    I
    d
    l
    e
    (
    c
    t
    x
    , I
    d
    l
    e
    S
    t
    a
    t
    e
    E
    v
    e
    n
    t
    .
    R
    E
    A
    D
    E
    R
    _
    I
    D
    L
    E
    _
    E
    V
    E
    N
    T
    )
    ;

    View Slide

  13. GARBAGE-COLLECTOR MATTERS
    The Garbage-Collector really matters
    Use a CMS-based collector or G1 if you want high-
    troughput
    Size different areas depending on your application / access
    pattern
    “Stop-the-world GC is your worst enemy if you
    want to push data hard”

    View Slide

  14. GARBAGE COLLECTOR
    Tune the GC is kind of an "black art"
    Be sure you understand what you are doing
    GC-Tuning params are different per App

    View Slide

  15. BUFFERS
    Allocate / Deallocate from direct buffers is expensive
    Allocate/ Deallocate from heap buffers is cheap
    “Free up memory of direct buffers is
    expensive”
    “Unfortunately zero-out the byte array of
    heap buffers is not for free too”

    View Slide

  16. BUFFERPOOLING TO THE RESCUE
    Pool buffers if you need to create a lot of them
    This is especially true for direct buffers
    There is also Unsafe.... but its " unsafe " ;)

    View Slide

  17. MEMORY FRAGMENTATION
    Memory fragmentation is bad, as you will waste memory
    More often GC to remove fragmentation.
    “Can't insert int here as we need 4 slots!”

    View Slide

  18. GATHERING WRITES / SCATTERING READS
    Use Gathering writes / Scattering reads
    Especially useful for protocols that can be assembled out of
    multiple buffers
    IMPORTANT: Gathering writes only works without memory
    leak since java7 and late java6.

    View Slide

  19. USE DIRECT BUFFERS FOR SOCKETS
    Use direct buffers when you do operations on sockets
    “WHY ?”
    “Internally the JDK* will copy the buffer
    content to a direct buffer if you not use one”

    View Slide

  20. MINIMIZE SYSCALLS
    Also true for other operations that directly hit the OS
    Batch up things, but as always there is a tradeoff.
    “Only call Channel.write(...) / Channel.read(...)
    if you really need!”

    View Slide

  21. MEMORY COPIES ARE NOT FOR FREE
    USE THEM!
    “ByteBuffer expose operations like slice(),
    duplicate() for a good reason”

    View Slide

  22. ZERO-MEMORY-COPY A.K.A FILECHANNEL
    Many Operation Systems support it
    Helps to write File content to a Channel in an efficient way
    “Only possible if you not need to transform
    the data during transfer!”

    View Slide

  23. THROTTLE READS / WRITES /ACCEPTS
    Otherwise you will have fun with OOM
    interestedOps(..) update to the rescue!
    This will push the "burden" to the network stack
    https://github.com/netty/netty/issues/1024
    “But not call interestedOps(...) too often, its
    expensive!”

    View Slide

  24. DON'T REGISTER FOR OP_WRITE
    Don't register for OP_WRITE on the Selector by default
    Only do if you could not write the complete buffer
    Remove OP_WRITE from interestedOps() after you was
    able to write
    “Remember most of the times the Channel is
    writable!”

    View Slide

  25. DON'T BLOCK!
    Don't block the Thread that handles the IO
    You may be surprised what can block
    “I look at you DNS resolution!”
    “If you really need to block move it to an extra
    ThreadPool”

    View Slide

  26. BLOCKING IN ACTION
    “RED != Good!”

    View Slide

  27. SELECTIONKEY OPERATIONS
    SelectionKey.interestedOps(....);
    “This method may be invoked at any time.
    Whether or not it blocks, and for how long is
    implementation-dependent”

    View Slide

  28. OPTIMIZE
    BAD
    BETTER!
    p
    u
    b
    l
    i
    c v
    o
    i
    d s
    u
    s
    p
    e
    n
    d
    R
    e
    a
    d
    (
    ) {
    k
    e
    y
    .
    i
    n
    t
    e
    r
    e
    s
    t
    O
    p
    s
    (
    k
    e
    y
    .
    i
    n
    t
    e
    r
    e
    s
    t
    O
    p
    s
    (
    ) & ~
    O
    P
    _
    R
    E
    A
    D
    )
    ;
    }
    p
    u
    b
    l
    i
    c v
    o
    i
    d s
    u
    s
    p
    e
    n
    d
    R
    e
    a
    d
    (
    ) {
    i
    n
    t o
    p
    s = k
    e
    y
    .
    i
    n
    t
    e
    r
    e
    s
    t
    O
    p
    s
    (
    )
    ;
    i
    f (
    (
    o
    p
    s & O
    P
    _
    R
    E
    A
    D
    ) !
    = 0
    ) {
    k
    e
    y
    .
    i
    n
    t
    e
    r
    e
    s
    t
    O
    p
    s
    (
    o
    p
    s & ~
    O
    P
    _
    R
    E
    A
    D
    )
    ;
    }
    }

    View Slide

  29. BE MEMORY EFFICIENT
    “When write a System that handles 100k of
    concurrent connections every saved memory
    count for long-living objects”

    View Slide

  30. MEMORY EFFICIENT ATOMIC
    AtomicReference => AtomicReferenceFieldUpdater
    AtomicBoolean => AtomicIntegerFieldUpdater
    AtomicLong => AtomicIntegerFieldUpdater
    AtomicInteger=> AtomicIntegerFieldUpdater
    “It's ugly, but sometimes you just have to do
    it!”

    View Slide

  31. DATA-STRUCTURE MATTERS
    Think about what data-structure fits best
    Linked vs. Array based
    Access pattern ?!?

    View Slide

  32. VOLATILE
    Volatile reads are cheap.... But still not for free
    Cache volatile variables if possible to minimize flushes

    View Slide

  33. OPTIMIZE
    BAD
    BETTER
    p
    r
    i
    v
    a
    t
    e v
    o
    l
    a
    t
    i
    l
    e S
    e
    l
    e
    c
    t
    o
    r s
    e
    l
    e
    c
    t
    o
    r
    ;
    p
    u
    b
    l
    i
    c v
    o
    i
    d m
    e
    t
    h
    o
    d
    (
    ) .
    .
    .
    . {
    s
    e
    l
    e
    c
    t
    o
    r
    .
    s
    e
    l
    e
    c
    t
    (
    )
    ;
    .
    .
    .
    .
    }
    p
    r
    i
    v
    a
    t
    e v
    o
    l
    a
    t
    i
    l
    e S
    e
    l
    e
    c
    t
    o
    r s
    e
    l
    e
    c
    t
    o
    r
    ;
    p
    u
    b
    l
    i
    c v
    o
    i
    d m
    e
    t
    h
    o
    d
    (
    ) .
    .
    .
    . {
    S
    e
    l
    e
    c
    t
    o
    r s
    e
    l
    e
    c
    t
    o
    r = t
    h
    i
    s
    .
    s
    e
    l
    e
    c
    t
    o
    r
    ;
    s
    e
    l
    e
    c
    t
    o
    r
    .
    s
    e
    l
    e
    c
    t
    (
    )
    ;
    .
    .
    .
    .
    }

    View Slide

  34. MINIMIZE STACKDEPTH
    Deep stacks are our enemies, because they are expensive
    Use tail-recursive calls if possible
    “WHY?”
    “Everything that needs to be stored till the call
    is done needs memory...”

    View Slide

  35. USE JDK7 IF POSSIBLE
    Also has some other goodies like: NIO.2 , UDP Multicast,
    SCTP
    “Allocation / Deallocation of ByteBuffers is a
    lot faster now...”

    View Slide

  36. WELL DEFINED THREAD-MODEL
    It makes development easier
    Reduce context-switching
    Reduce the need for synchronization in many cases

    View Slide

  37. CHOOSE THE CORRECT PROTOCOL
    UDP
    TCP
    SCTP
    UDT
    “It's always a trade-off!”

    View Slide

  38. PIPELINING IS AWESOME
    Allow to send / receive more then one message before
    response
    This minimize send / receive operations
    Popular protocols which support Pipelining: HTTP, SMTP,
    IMAP
    “ If you write your own protocol think about
    Pipelining! ”

    View Slide

  39. DON'T WANT TO HASSLE
    Netty
    Vert.x
    Xnio
    Grizzly
    Apache Mina
    “There are a few frameworks to rescue....”

    View Slide

  40. WANT TO LEARN MORE
    “Attend my talk about Netty 4 tomorrow ;)”

    View Slide

  41. QUESTIONS?

    View Slide

  42. THANKS

    View Slide