Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rapid Start: Faster Internet Connections, with ...

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.
Avatar for kazuho kazuho
April 22, 2026

Rapid Start: Faster Internet Connections, with Ruby's Help

Presented at Rubykaigi 2026.

Avatar for kazuho

kazuho

April 22, 2026

More Decks by kazuho

Other Decks in Technology

Transcript

  1. • lead developer of the H2O HTTP server ▪ used

    by Fastly ▪ has its own HTTP/1, 2, 3, TLS/1.3, QUIC implementation ▪ supports HTTP routing using mruby (Rack) • as a hobby programmer: ◦ rat (ruby-based IPv4 NAT) • as a co-author of RFCs: ◦ RFC 8297 (HTTP 103 Early Hints) ◦ RFC 9218 (HTTP Extensible Priorities) ◦ RFC 9849 (TLS Encrypted Client Hello) Who am I 2
  2. • Rapid Start ◦ Fastly’s new startup algorithm of its

    congestion control • jrf - our ruby-based tool for log analysis • Visualization of network-related performance tests Topics 3
  3. • TCP+TLS/1.2: ◦ full handshake: 3 RT ◦ resumption: 2

    RT • TCP+TLS/1.3: ◦ full handshake: 2 RT ◦ resumption: 1 RT • QUIC: ◦ full handshake: 1 RT ◦ resumption: 0 RT ※RT = number of round-trips Time to establish a connection 5
  4. • TCP+TLS/1.2: ◦ full handshake: 3 RT ◦ resumption: 2

    RT • TCP+TLS/1.3: ◦ full handshake: 2 RT ◦ resumption: 1 RT • QUIC: ◦ full handshake: 1 RT ◦ resumption: 0 RT ※RT = number of round-trips Time to establish a connection HTTP/3 6
  5. • With HTTP/3, handshake latency is minimized: ◦ full handshake:

    1 RT ◦ resumption: 0 RT Reducing the latency of HTTP 7
  6. • With HTTP/3, handshake latency is minimized: ◦ full handshake:

    1 RT ◦ resumption: 0 RT • Time To First Byte (TTFB) is: ◦ full handshake: 2 RT ◦ resumption: 1 RT Reducing the latency of HTTP 8
  7. • With HTTP/3, handshake latency is minimized: ◦ full handshake:

    1 RT ◦ resumption: 0 RT • Time To First Byte (TTFB) is: ◦ full handshake: 2 RT ◦ resumption: 1 RT • What about Time To Last Byte (TTLB)? ◦ TTLB is typically TTFB plus the speed of Slow Start Reducing the latency of HTTP 9
  8. • Initial phase of congestion control: ◦ used when the

    available bandwidth is unknown ◦ to quicly determine the available bandwidth Slow Start 10
  9. • Initial phase of congestion control: ◦ used when the

    available bandwidth is unknown ◦ to quicly determine the available bandwidth • Start by sending IW packets: ◦ IW = 10 (RFC), 30 (real-world) ◦ send 2x as more for each ack received Slow Start 11
  10. • Initial phase of congestion control: ◦ used to quickly

    fulfill the available bandwidth, unknown at the beginning of the connection • Starts by sending IW packets: ◦ IW = 10 (RFC), 30 (real-world) ◦ send 2x as more for each ack received • When packets are dropped (i.e., the network overflows), slow start enters “recovery” to repair lost packets, then congestion control switches to the second phase, known as congestion avoidance Slow Start 12
  11. Slow Start and BDP 0 1 2 3 8 d

    c b a f e 4 5 6 7 9 Idle BDP: number of packets needed to fully utilize the bottleneck link without building queue Queue builds up when packets arrive faster than the bottleneck link When the queue overflows, packets are dropped bottleneck link
  12. • Idle BDP = 55Mb/s * 0.039s 0 1 2

    3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP
  13. • Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒

    268KB ≒ 209 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP
  14. • Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒

    268KB ≒ 209 packets • With Slow Start: 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP
  15. • Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒

    268KB ≒ 209 packets • With Slow Start: ◦ 1RT: 30 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP
  16. • Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒

    268KB ≒ 209 packets • With Slow Start: ◦ 1RT: 30 packets ◦ 2RT: 60 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP
  17. • Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒

    268KB ≒ 209 packets • With Slow Start: ◦ 1RT: 30 packets ◦ 2RT: 60 packets ◦ 3RT: 120 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP
  18. • Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒

    268KB ≒ 209 packets • With Slow Start: ◦ 1RT: 30 packets ◦ 2RT: 60 packets ◦ 3RT: 120 packets ◦ 4RT: 240 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue bottleneck link is finally saturated Slow Start and BDP
  19. Vertical axis: bytes sent / acked (cumulative) Horizontal axis: time

    elapsed (milliseconds) Black dot: packet sent Yellow dot: ack received (reflects when the receiver received packets) This network on simulator 22
  20. Vertical axis: bytes sent / acked (cumulative) Horizontal axis: time

    elapsed (milliseconds) Black dot: packet sent Yellow dot: ack received (reflects when the receiver received packets) This network on simulator 23
  21. Vertical axis: bytes sent / acked (cumulative) Horizontal axis: time

    elapsed (milliseconds) Black dot: packet sent Yellow dot: ack received (reflects when the receiver received packets) This network on simulator nothing is received, as the sender stops initial sending after 0.5 RTT 24
  22. Vertical axis: bytes sent / acked (cumulative) Horizontal axis: time

    elapsed (milliseconds) Black dot: packet sent Yellow dot: ack received (reflects when the receiver received packets) This network on simulator nothing is received, as the sender stops initial sending after 0.5 RTT underutilization 25
  23. Impact of queuing and drops (VDSL) moments of idle queuing

    due to bursty sending packet drops, and hence recoveries due to queue overflow 29
  24. Impact of queuing and drops (VDSL) moments of idle queuing

    due to bursty sending packet drops, and hence recoveries due to queue overflow 2nd recovery happens almost immediately after the 1st 30
  25. Impact of queuing and drops (VDSL) data cannot be used

    until packet drops are repaired moments of idle queuing due to bursty sending packet drops, and hence recoveries due to queue overflow 2nd recovery happens almost immediately after the 1st 31
  26. Impact of queuing and drops (VDSL) data cannot be used

    until packet drops are repaired moments of idle queuing due to bursty sending packet drops, and hence recoveries due to queue overflow 2nd recovery happens almost immediately after the 1st excess draining followed by a burst 32
  27. • Utilize the available bandwidth as soon as possible ◦

    Initial window larger than 30 packets ◦ More aggressive growth than 2x per RTT Think of an ideal startup 33
  28. • Utilize the available bandwidth as soon as possible ◦

    Initial window larger than 30 packets ◦ More aggressive growth than 2x per RTT • Minimalze the negative impact of packet drops ◦ To avoid drops in short transmissions, delay the initial drop as late as possible ◦ Reduce the number of recovery events ◦ Reduce the number of packets dropped per each recovery Think of an ideal startup 34
  29. • Utilize the available bandwidth as soon as possible ◦

    Initial window larger than 30 packets ◦ More aggressive growth than 2x per RTT • Minimalze the negative impact of packet drops ◦ To avoid drops in short transmissions, delay the initial drop as late as possible ◦ Reduce the number of recovery events ◦ Reduce the number of packets dropped per each recovery • To mitigate the risk of overflowing a queue other than that immediately before the bottleneck, avoid bursty sending Think of an ideal startup 35
  30. • Proposed by Fastly (1st Internet-Draft submitted in Nov 2025)

    ※CWND: estimate of the full BDP (idle BDP + queue capacity) Rapid Start Slow Start Rapid Start initial sending stops after 0.5 RTT stops after 1 RTT increase 2x per RTT 3x per RTT (switches to 2x when observing queue buildup) recovery CWND *= 0.5 determine CWND based on packet drop ratio 36
  31. • Slow Start: ◦ Sends IW packets for 0.5 RTT

    • Rapid Start: ◦ Sends 2x IW packets for 1 RTT Rapid Start: initial sending 37
  32. • Slow Start: ◦ Sends IW packets for 0.5 RTT

    • Rapid Start: ◦ Sends 2x IW packets for 1 RTT ◦ Risk: potential queue buildup and earlier packet drops ▪ But no more bursty than Slow Start, as the interval between each packet sent remains the same Rapid Start: initial sending 38
  33. • Slow Start: 2x • Rapid Start: ◦ queue_buildup?note ?

    3x : 2x Note: recommended threshold is: rtt_floor > min(rtt_min + 4ms, rtt_min * 1.1), where rtt_floor is the smallest RTT observed over the most recent 1RT Rapid Start: CWND increase 39
  34. • Slow Start: 2x • Rapid Start: ◦ queue_buildup?note ?

    3x : 2x ◦ rationale: queue buildup is an outcome of the sender sending faster than the bottleneck link ▪ slower increase delays the chance of packet drops Note: recommended threshold is: rtt_floor > min(rtt_min + 4ms, rtt_min * 1.1), where rtt_floor is the smallest RTT observed over the most recent 1RT Rapid Start: CWND increase 40
  35. Recap: Impact of queuing and drops data cannot be used

    until packet drops are repaired moments of idle queuing due to bursty sending packet drops, and hence recoveries due to queue overflow 2nd recovery happens almost immediately after the 1st excess draining followed by a burst 42
  36. • Slow Start enters 2nd recovery almost immediately, because: ◦

    Packet drops are observed 1 RT after overflow (i.e., when CWND ~ full BDP) Slow Start’s recovery problem 45
  37. • Slow Start enters 2nd recovery almost immediately, because: ◦

    Packet drops are observed 1 RT after overflow (i.e., when CWND ~ full BDP) ◦ As Slow Start increases CWND by 2x per RTT, CWND ~ 2x full_BDP when observing a drop Slow Start’s recovery problem 46
  38. • Slow Start enters 2nd recovery almost immediately, because: ◦

    Packet drops are observed 1 RT after overflow (i.e., when CWND ~ full BDP) ◦ As Slow Start increases CWND by 2x per RTT, CWND ~ 2x full_BDP when observing a drop ◦ Reducing CWND to half yields the full BDP, and therefore congestion control immediatly fulfills the bottleneck Slow Start’s recovery problem 47
  39. • Slow Start enters 2nd recovery almost immediately, because: ◦

    Packet drops are observed 1 RT after overflow (i.e., when CWND ~ full BDP) ◦ As Slow Start increases CWND by 2x per RTT, CWND ~ 2x full_BDP when observing a drop ◦ Reducing CWND to half yields the full BDP, and therefore congestion control immediatly fulfills the bottleneck • Reducing CWND to ¼ is not a good solution, because that would fully drain the queue, leading to underutilization of the bottleneck link Slow Start’s recovery problem 48
  40. • For each ack or packet drop, gradually decrease CWND,

    so that, at the CWND recovery_exit becomes: 0.5 * bytes_acked_in_recovery ◦ because bytes acked in 1 RT reflects the full BDP • Benefits: ◦ Works regardless of the increase ratio ◦ As CWND is gradually reduced, transmission resumes before the queue is fully drained Rapid Start: recovery 49
  41. • Upon entering recovery: cwnd *= 5/6 • For each

    ACK: cwnd -= 1/3 * bytes_newly_acked • For each loss: cwnd -= 5/6 * bytes_newly_lost See draft-kazuho-ietf-rapid-start-02 to see how these constants are derived Rapid Start: recovery 50
  42. Rapid Start on Simulator (VDSL) enters recovery only once, but

    takes longer to repair drops due to 3x overshoot no idle moments queue buildup packet drops 56
  43. Rapid Start on Simulator (VDSL) enters recovery only once, but

    takes longer to repair drops due to 3x overshoot no idle moments queue buildup packet drops lands at a the right queue depth 57
  44. • HTTP/3 connections divided into 4 groups • For connections

    serving cached objects >= 200KB as the first request, record transport-level statistics and TTLB, when all bytes for that cached objects are acked • for 1 week on 7 POPs across the globe: East / SE Asia, East / West Europe, Africa, North / South America Setup: divided into 4 groups initial sending increase recovery baseline (slow start) 30 pkts in 0.5 RTT 2x CWND *= 0.5 jumpstart 60 pkts in 1 RTT 2x CWND *= 0.5 rapid-wo-jump 30 pkts in 0.5 RTT 3x / 2x CWND reduced relative to loss ratio rapidstart 60 pkts 1 RTT 59
  45. {"module":"h2o","type":"h3s_stream0_ttlb","tid":397502,"time":1773026516280,"conn_id":1907798,"method":"GET","content_length":226578,"ttlb":364,"num-pa ckets.received":24,"num-packets.decryption-failed":0,"num-packets.sent":191,"num-packets.lost":0,"num-packets.lost-time-threshold":0,"num-packets.ack-r eceived":191,"num-packets.late-acked":0,"num-packets.initial-received":2,"num-packets.zero-rtt-received":0,"num-packets.handshake-received":2,"num-pack ets.initial-sent":1,"num-packets.zero-rtt-sent":0,"num-packets.handshake-sent":4,"num-packets.received-out-of-order":0,"num-packets.received-ecn-ect0": 0,"num-packets.received-ecn-ect1":0,"num-packets.received-ecn-ce":0,"num-packets.acked-ecn-ect0":0,"num-packets.acked-ecn-ect1":0,"num-packets.acked-ec n-ce":0,"num-packets.sent-promoted-paths":0,"num-packets.ack-received-promoted-paths":0,"num-packets.max-delayed":0,"num-packets.delayed-used":0,"num-b ytes.received":4737,"num-bytes.sent":236943,"num-bytes.lost":0,"num-bytes.ack-received":236895,"num-bytes.stream-data-sent":231728,"num-bytes.stream-da ta-resent":226,"num-frames-received.padding":3259,"num-frames-received.ping":1,"num-frames-received.ack":19,"num-frames-received.reset_stream":0,"num-f rames-received.stop_sending":0,"num-frames-received.crypto":2,"num-frames-received.new_token":0,"num-frames-received.stream":2,"num-frames-received.max _data":0,"num-frames-received.max_stream_data":0,"num-frames-received.max_streams_bidi":0,"num-frames-received.max_streams_uni":0,"num-frames-received.

    data_blocked":0,"num-frames-received.stream_data_blocked":0,"num-frames-received.streams_blocked":0,"num-frames-received.new_connection_id":0,"num-fram es-received.retire_connection_id":0,"num-frames-received.path_challenge":0,"num-frames-received.path_response":0,"num-frames-received.transport_close": 0,"num-frames-received.application_close":0,"num-frames-received.handshake_done":0,"num-frames-received.datagram":0,"num-frames-received.ack_frequency" :0,"num-frames-received.immediate_ack":0,"num-frames-sent.padding":0,"num-frames-sent.ping":1,"num-frames-sent.ack":3,"num-frames-sent.reset_stream":0, "num-frames-sent.stop_sending":0,"num-frames-sent.crypto":7,"num-frames-sent.new_token":2,"num-frames-sent.stream":188,"num-frames-sent.max_data":0,"nu m-frames-sent.max_stream_data":0,"num-frames-sent.max_streams_bidi":0,"num-frames-sent.max_streams_uni":0,"num-frames-sent.data_blocked":0,"num-frames- sent.stream_data_blocked":0,"num-frames-sent.streams_blocked":0,"num-frames-sent.new_connection_id":6,"num-frames-sent.retire_connection_id":0,"num-fra mes-sent.path_challenge":0,"num-frames-sent.path_response":0,"num-frames-sent.transport_close":0,"num-frames-sent.application_close":0,"num-frames-sent .handshake_done":1,"num-frames-sent.datagram":0,"num-frames-sent.ack_frequency":0,"num-frames-sent.immediate_ack":0,"num-paths.created":0,"num-paths.va lidated":0,"num-paths.validation-failed":0,"num-paths.migration-elicited":0,"num-paths.promoted":0,"num-paths.closed-no-dcid":0,"num-paths.ecn-validate d":0,"num-paths.ecn-failed":1,"num-ptos":1,"num-handshake-timeouts":0,"num-initial-handshake-exceeded":0,"num-jumpstart-applicable":1,"quic.jumpstart.a pplicable":1,"num-rapid-start":0,"num-paced":1,"num-respected-app-limited":0,"handshake-confirmed-msec":369,"jumpstart.prev-rate":0,"jumpstart.prev-rtt ":0,"jumpstart.new-rtt":106,"jumpstart.cwnd":0,"quic.jumpstart.time-to-idle":647,"token-sent.at":0,"token-sent.rate":579889,"token-sent.rtt":67,"rtt.mi nimum":66,"rtt.smoothed":81,"rtt.variance":19,"rtt.latest":75,"loss-thresholds.use-packet-based":1,"loss-thresholds.time-based-percentile":128,"cc.cwnd ":273280,"cc.ssthresh":4294967295,"cc.cwnd-initial":44160,"cc.cwnd-exiting-slow-start":0,"cc.exit-slow-start-at":9223372036854775807,"cc.cwnd-exiting-j umpstart":0,"cc.cwnd-minimum":4294967295,"cc.cwnd-maximum":273280,"cc.num-loss-episodes":0,"cc.num-ecn-loss-episodes":0,"delivery-rate.latest":210149," delivery-rate.smoothed":739154,"delivery-rate.stdev":1078961,"num-sentmap-packets-largest":89} Example: stats for 1 connection 60
  46. • Size of the dataset in 1 experiment: ◦ LDJSON

    of 20M lines; 80GB (3.7GB in .gz) Analyzing data 61
  47. • Size of the dataset in 1 experiment: ◦ LDJSON

    of 20M lines; 80GB (3.7GB in .gz) • Need to apply various ad-hoc queries: ◦ jq is the obvious choice, however… Analyzing data 62
  48. • The grammar is not intuitive • Slow • Not

    suited for processing huge LDJSON ◦ Example: | min buffers the entire input ◦ when log analysis is almost alywas a streaming, map-reduce-like operation of huge data Issues with jq 63
  49. jq -s '{ "min": (map(."rtt.minimum") | min), "max": (map(."rtt.minimum") |

    max), "avg": (map(."rtt.minimum") | add / length), }' min/max/avg over rtt.minimum 64
  50. jq -s '{ "min": (map(."rtt.minimum") | min), "max": (map(."rtt.minimum") |

    max), "avg": (map(."rtt.minimum") | add / length), }' min/max/avg over rtt.minimum -s buffers entire input; jq essentially stops working when the input is larger than RAM size 65
  51. jq -n ' reduce inputs as $o ( {min: null,

    max: null, sum: 0, n: 0}; ($o."rtt.minimum") as $x | { min: (if .min == null or $x < .min then $x else .min end), max: (if .max == null or $x > .max then $x else .max end), sum: (.sum + $x), n: (.n + 1), } ) | { min, max, avg: (.sum / .n), }' min/max/avg over rtt.minimum With -n, each JSON object is processed separately; but aggregation logic needs to be hand-written 66
  52. • Streaming processing is easy to write • JSON parser

    is fast • The script is JIT-compiled Writing ruby scripts instead 67
  53. • Streaming processing is easy to write • JSON parser

    is fast • The script is JIT-compiled • However: ◦ It becomes too long as an one-liner ◦ Ends up as a script with many many options ▪ Hard to maintain Writing ruby scripts instead 68
  54. • Streaming processing is easy to write • JSON parser

    is fast • The script is JIT-compiled • However: ◦ It becomes too long as an one-liner ◦ Ends up as a script with many many options ▪ Hard to maintain ◦ Letting AI write is an option, but how would you verify your ad-hoc query is converted to correct code? Writing ruby scripts instead 69
  55. • more SQL-like grammar + ruby DSL • compile the

    query language using eval ◦ let JIT optimize the runtime and the query altogether Writing jq (improved) in ruby 71
  56. • more SQL-like grammar + ruby DSL • compile the

    query language using eval ◦ let JIT optimize the runtime and the query altogether • Streaming processing of NDJSON Writing jq (improved) in ruby 72
  57. # Filter then extract jrf 'select(_["x"] > 10) >> _["foo"]'

    # Aggregate jrf 'select(_["item"] == "Apple") >> sum(_["count"])' jrf 'percentile(_["ttlb"], 0.50)' # Group by key and aggregate jrf 'group_by(_["item"]) { |row| sum(row["count"] * row["price"]) }' jrf 74
  58. • Syntax: stage connected using >> ◦ Each stage is

    just a ruby block • Filter: ◦ select(expr) • Transform: ◦ _["foo"] • Aggregation: ◦ min(expr), max(expr), sum(expr), … ◦ reduce(initial) { any ruby code } jrf 75
  59. class Stage def initialize(block, src : nil) ... @ctx =

    Class.new(RowContext) do define_method(:__jrf_expr__, &block) end end end # instantiated as: Stage.new(eval("proc { #{stage[:src]} }", ...)) jrf - internals Each stage expression is converted to a method, and gets called 76
  60. • In typical log processing: ◦ filtering and transformation happen

    before aggregation ◦ logs are split into multiple files jrf -P 10 'filter >> transform >> reduce' Jrf - automatic paralellization 77
  61. • In typical log processing: ◦ filtering and transformation happen

    before aggregation ◦ logs are split into multiple files • Therefore, processing of each file can be parallelized for: ◦ filtering and transformations in stages upfront jrf -P 10 'filter >> transform >> reduce' jrf - automatic paralellization 78
  62. • In typical log processing: ◦ filtering and transformation happen

    before aggregation ◦ logs are split into multiple files • Therefore, processing of each file can be parallelized for: ◦ filtering and transformations in stages upfront ◦ certain aggregations (e.g., min, max, sum) ▪ each thread calculates its own, then the results are merged jrf -P 10 'filter >> transform >> reduce' jrf - automatic paralellization 79
  63. • Internally, jrf does the following: 1. Dry-run the 1st

    JSON object for each stage to find the first few stages that can be parallelized. 2. Calls fork(2) and spawns workers that process those stages in parallel. 3. Each worker emits its result as NDJSON to a pipe 4. The main process reads from the pipes and feed the input to the remaining stages. jrf - automatic paralellization 80
  64. • min: ◦ jq -s 'map(."rtt.minimum") | min' ◦ jq

    -n 'reduce inputs."rtt.minimum" as $x (null; if . == null or $x < . then $x else . end)' ◦ jrf 'min(_["rtt.minimum"])' ◦ jrf -P 10 'min(_["rtt.minimum"])' jrf - benchmark 81
  65. jrf - benchmark 950MB (single file) 81.4GB (29 files) min(rtt.minim

    um) TTLB percentile delta min(rtt.minim um) TTLB percentile delta jq -s out of memory jq -n jrf jrf -P 10 all units in seconds 82
  66. • TTLB percentile delta: ◦ jq -n ' include "helpers";

    0.1 as $step | reduce inputs as $row ( {"baseline": [], "jumpstart": [], "rapid-no-jump": [], "rapidstart": []}; if ($row | base_cond(200000; 400000)) then .[$row | group_name] += [$row.ttlb] else . End ) | with_entries(.value |= percentiles($step)) | .baseline as $baseline | with_entries(select(.key != "baseline")) | with_entries( .value |= [range(0; length) as $i | (.[$i] / $baseline[$i] - 1)] )' jrf - benchmark 83
  67. • TTLB percentile delta: ◦ jrf 'select(base_cond(_, 200000, 400000)) >>

    [group_name(_), _["ttlb"]] >> group_by(_[0]) { percentile(_[1], $perc ||= 0.05.step(0.95, 0.1)) } >> map_values{|arr| arr.zip(_["baseline"]).map {|v,bv| v.to_f / bv - 1 } } >> _.reject{|k| k == "baseline"}' ◦ jrf -P 10 '...(same as above)...' jrf - benchmark 84
  68. jrf - benchmark 950MB (single file) 81.4GB (29 files) min(rtt.minim

    um) TTLB percentile delta min(rtt.minim um) TTLB percentile delta jq -s 7.93 8.52 out of memory jq -n 7.45 13.59 667.44 > 1800 jrf 2.29 2.39 226.80 240.92 jrf -P 10 2.27 2.38 31.41 31.69 all units in seconds 85
  69. jrf - benchmark 950MB (single file) 81.4GB (29 files) min(rtt.minim

    um) TTLB percentile delta min(rtt.minim um) TTLB percentile delta jq -s 7.93 8.52 out of memory jq -n 7.45 13.59 667.44 > 1800 jrf 2.29 2.39 226.80 240.92 jrf -P 10 2.27 2.38 31.41 31.69 3.3x 3.6x 21x > 50x all units in seconds 86
  70. all units in seconds jrf - benchmark 950MB (single file)

    81.4GB (29 files) min(rtt.minim um) TTLB percentile delta min(rtt.minim um) TTLB percentile delta jq -s 7.93 8.52 out of memory jq -n 7.45 13.59 667.44 > 1800 jrf 2.29 2.39 226.80 240.92 jrf -P 10 2.27 2.38 31.41 31.69 3.3x 3.6x 21x > 50x 2.6GB/s 87
  71. • Written 99.9% by Codex and Claude ◦ Required thorough

    human design review; otherwise, AI often broke the design structure that warrants efficiency • Productivity and correctness improved thanks to: ◦ AI generating the engine (jrf) and its test suite ◦ Humans and AI writing jrf queries in the DSL, which are declarative, concise, easier to understand and maintain jrf - use of AI 88
  72. • Now that we have the tool, how do we

    present the TTLBs as charts? ◦ next slide shows an example (of mine) from IETF 121 Visualization of A/B tests 89
  73. • Now that we have the tool, how do we

    present the TTLBs as charts? ◦ apparently, 2D charts using percentiles / TTLB do not work Visualization of A/B tests 91
  74. • Now that we have the tool, how do we

    present the TTLBs as charts? ◦ apparently, 2D charts using percentiles / TTLB do not work • The answer is to use: ◦ vertical axis: percentiles ◦ Horizontal axis: percentage delta of TTLB Visualization of A/B tests 92
  75. • All POPs • All objects ≥ 200KB • With

    Rapid Start, TTLB is reduced by 14.7% Note: thawtooth at the lower percentiles are due to the clock granularity being 1ms TTLB Reduction: Global 93
  76. 95 • Global data for different size bins: 200KB -

    400KB / 400KB - 800KB / 800KB - 1.6MB / 1.6MB - 3.2MB • TTLB reduction: 10.6% (1.6MB - 3.2MB) ~ 14.9% (200KB - 400KB) TTLB Reduction: by Object Size Bin
  77. Packet Loss Ratio: Global 96 slow start (baseline) jumpstart rapid-

    no-jump rapidstart avg. 1.52% 1.61% 1.92% 1.98% P50 0.62% 0.62% 0.90% 0.85% P90 4.36% 4.57% 4.99% 5.06% P99 13.80% 14.22% 14.97% 15.55%
  78. 97 Packet Loss Ratio: per-POP POP with largest P99 PLR:

    • slow start: 19.60% • jumpstart: 20.05% • rapid-no-jump: 20.99% • rapid start: 21.65%
  79. 98 TTLB Reduction: per-POP • TTLB reduction: 10.8% ~ 21.5%

    But why is the shape different for North America? To find an answer, you’d chat with AI and run tens of queries: such iteration is only possible with jrf.
  80. • To analyze logs, it is paramount to have an

    inituitive query DSL that runs fast: ◦ easy to run ad-hoc queries ◦ no need to setup & maintain query infrastructure jrf for fast log analysis 100
  81. • To analyze logs, it is paramount to have an

    inituitive query DSL that runs fast: ◦ easy to run ad-hoc queries ◦ no need to setup & maintain query infrastructure • jrf is an NDJSON query program ◦ with a DSL based on and extensible using ruby ◦ runs as 20x faster than jq ▪ 2.6GB/sec on a 10 core CPU jrf for fast log analysis 101
  82. • Ruby is a powerful tool for writing DSL executors:

    ◦ the syntax is DSL friendly ◦ the entire workflow can be JIT-compiled ◦ has highly optimized libraries (e.g., JSON) Ruby for optimized tooling 102
  83. • Ruby is a powerful tool for writing DSL executors:

    ◦ the syntax is DSL friendly ◦ the entire workflow can be JIT-compiled ◦ has highly optimized libraries (e.g., JSON) • AI has made it much easier to build well-tested DSL executors. Relying on them lets us work at a higher level, improving productivity without having to trust untested AI-written code to do the right thing. Ruby for optimized tooling 103
  84. • To visualize network-related performance tests, consider using 2D charts

    that: ◦ for the vertical axis, uses percentile ◦ for the horizontal axis, uses delta % from baseline Visualizing network perf tests 104
  85. • TLS/1.3 and QUIC reduced handshake latency • Next step

    is reducing TTLB: ◦ Rapid Start replaces Slow Start, and reduces TTLB by 14.7% globally (>=200KB objects) ◦ Ruby was an essential tool for developing Rapid Start Rapid Start 105
  86. • TLS/1.3 and QUIC reduced handshake latency • Next step

    is reducing TTLB: ◦ Rapid Start replaces Slow Start, and reduces TTLB by 14.7% globally (>=200KB objects) ◦ Ruby was an essential tool for developing Rapid Start Rapid Start Ruby is making the Web faster! 106