Slide 1

Slide 1 text

Scaling Services with Netty Norman Maurer This is not a contribution

Slide 2

Slide 2 text

What Scale? 3 tb per sec ~650000 instances of Netty in production public facing + internal facing 10+ millions of OPS per sec 10+ of millions of connections terminated plaintext and TLS / SSL Non-Blocking FTW! This is not a contribution

Slide 3

Slide 3 text

Using an Apple Service? Chances are good Netty is involved somehow. This is not a contribution

Slide 4

Slide 4 text

How Netty helps This is not a contribution

Slide 5

Slide 5 text

SSL / TLS Custom implementation needed This is not a contribution

Slide 6

Slide 6 text

using well proven C libraries JNI SSLEngine implementation using OpenSSL / LibreSSL / BoringSSL via JNI ALPN / NPN Session Tickets minimize Object creation == less GC drop-in replacement for SSLEngineImpl This is not a contribution

Slide 7

Slide 7 text

Lets get ready for rumble SSLEngine Benchmark TLS_ECDE_RSA_WITH_AEAS_128_GCM_SHA256 is the only required cipher for HTTP2 This is not a contribution

Slide 8

Slide 8 text

Advanced Socket Features Linux specific features needed This is not a contribution

Slide 9

Slide 9 text

Support more advanced features JNI based transport Socket Options: SO_REUSEPORT TCP_INFO TCP_CORK TCP_FAST_OPEN TCP_NOTSENT_LOWAT splice(…) Unix Domain Sockets Edge Triggered and Level Triggered Linux This is not a contribution

Slide 10

Slide 10 text

Buffer Pooling Allocations are expensive This is not a contribution

Slide 11

Slide 11 text

Allocation times NanoSeconds 0 1500 3000 4500 6000 Bytes 0 256 1024 4096 16384 65536 Unpooled Heap Pooled Heap Unpooled Direct Pooled Direct This is not a contribution

Slide 12

Slide 12 text

PooledByteBufAllocator based on jemalloc paper (3.x) ThreadLocal caches for lock-free allocation synchronize per Arena that holds the different chunks of memory different size classes reduce fragmentation ThreadLocal Cache 2 Arena 1 Arena 2 Arena 3 Size-classes Size-classes Size-classes Thread 2 ThreadLocal Cache 1 Thread 1 a peek into the internals This is not a contribution

Slide 13

Slide 13 text

About the project adoption and more This is not a contribution

Slide 14

Slide 14 text

Companies using Netty Airbnb Alibaba Apple Ebay Facebook Fitbit Instagram Google Linkedin Line Microsoft Netflix Nike Pivotal Red Hat Spotify Square Squarespace Twitter Uber VMWare Yahoo …and many more This is not a contribution

Slide 15

Slide 15 text

Open Source Projects using Netty Akka Apache Cassandra Apache Spark Couchbase Elastic Search Finagle Gatling gRPC Nifty Play Ratpack Riposte Vert.x Spring Web …and many more This is not a contribution

Slide 16

Slide 16 text

We love feedback What you would like to see ? if you have any feedback, let us know open an issue ask a question on the mailing list This is not a contribution

Slide 17

Slide 17 text

We love contributions Get Involved Mailinglist IRC - #netty irc.freenode.org Github ASL2 OSS FTW! This is not a contribution

Slide 18

Slide 18 text

New developments in Netty Scott Mitchell This is not a contribution

Slide 19

Slide 19 text

KQueue Transport UDS, UDP, TCP Decouple general unix code from EPOLL transport More object oriented code (BsdSocket, LinuxSocket) EV_CLEAR (aka Edge Triggered) Hexley DarwinOS Mascot Copyright 2000 by Jon Hooper. All rights Reserved. http://www.hexley.com/license.html This is not a contribution

Slide 20

Slide 20 text

KQueue Transport Removed Map Socket Options RCV_ALLOC_TRANSPORT_PROVIDES_GUESS SO_ACCEPTFILTER Future reduce duplication in middle tier logic splice like capability? macOS This is not a contribution

Slide 21

Slide 21 text

OpenSSL Features OPENSSL_REFCNT ByteBuffer BIO Host Name Verification OCSP (BoringSSL & OpenSSL) This is not a contribution

Slide 22

Slide 22 text

OpenSSL Future Multiple TLS packets per wrap/unwrap Aggregate writes before wrapping TLS 1.3 This is not a contribution

Slide 23

Slide 23 text

DNS JDK DNS + Async Networking = _____ Client Subnet in DNS Queries Follow redirects in authority section Respect default host configuration for DNS servers (unix) Programmatically configurable Future DNS Request Lifecycle Tracing DNS Query Originator 198.51.100.241 Public Recursor 192.0.2.247 example.com A ? 1 example.com authoritative server example.com A ? 2 example.com A 192.0.2.14 3 192.0.2.14 example.com 203.0.113.18 example.com 4 This is not a contribution

Slide 24

Slide 24 text

HTTP/2 HPACK ByteBuf, No more buffering, var int improvements, header list/table size Limit reserved streams Dependency state refactor Future HTTP/2 Child Channels Connection Stream ID=0 Leader ID=3 Weight=201 Unblocked ID=5 Weight=101 Background ID=7 Weight=1 Follower ID=11 Weight=1 Speculative ID=9 Weight=1 GET /myscript.js ID=15 Weight=2 GET /ajax.xml ID=19 Weight=2 GET / ID=13 Weight=2 GET /img.gif ID=17 Weight=12 This is not a contribution

Slide 25

Slide 25 text

HTTP/2 Dependency Tree Original Attempt Overly fair -> bad Goodput UniformStreamByteDistributor No priority -> go fast WeightedFairQueueByteDistributor What if the NIC executed in parallel? Fairness vs Goodput Non-active streams accounted for Stream 0 Stream 3 Weight=10 Bytes=100 Stream 5 Weight=100 Bytes=200 Stream 7 Weight=16 Bytes=300 Stream 9 Weight=16 Bytes=400 This is not a contribution

Slide 26

Slide 26 text

HTTP/2 WeightedFaireQueueByteDistributor (1) Allocation Quantum = 50 Stream 3 Weight=10 Bytes=100 PseudoTime=0 Stream 5 Weight=100 Bytes=200 PseudoTime=0 Stream 7 Weight=16 Bytes=300 PseudoTime=0 Stream 9 Weight=16 Bytes=400 PseudoTime=0 Stream 3 PseudoTimeToWrite=0 Stream 0 PseudoTime=0 Stream 5 PseudoTimeToWrite=0 Stream 7 PseudoTimeToWrite=0 Stream 9 PseudoTimeToWrite=0 This is not a contribution

Slide 27

Slide 27 text

HTTP/2 WeightedFaireQueueByteDistributor (2) Stream 3 Weight=10 Bytes=50 PseudoTime=0 Stream 5 Weight=100 Bytes=200 PseudoTime=0 Stream 7 Weight=16 Bytes=300 PseudoTime=0 Stream 9 Weight=16 Bytes=400 PseudoTime=0 Stream 5 PseudoTimeToWrite=0 Stream 0 PseudoTime=50 Stream 3 PseudoTimeToWrite=550 Stream 7 PseudoTimeToWrite=0 Stream 9 PseudoTimeToWrite=0 This is not a contribution

Slide 28

Slide 28 text

HTTP/2 WeightedFaireQueueByteDistributor (3) Stream 3 Weight=10 Bytes=50 PseudoTime=0 Stream 5 Weight=100 Bytes=0 PseudoTime=0 Stream 7 Weight=16 Bytes=300 PseudoTime=0 Stream 9 Weight=16 Bytes=400 PseudoTime=0 Stream 0 PseudoTime=250 Stream 3 PseudoTimeToWrite=550 Stream 7 PseudoTimeToWrite=0 Stream 9 PseudoTimeToWrite=0 This is not a contribution

Slide 29

Slide 29 text

Questions & Answers This is not a contribution

Slide 30

Slide 30 text

Thank You This is not a contribution