Rapid Start: Faster Internet Connections, with Ruby's Help

Rapid Start: Faster Internet Connections, with Ruby’s Help Kazuho Oku,
Fastly

• lead developer of the H2O HTTP server ▪ used
by Fastly ▪ has its own HTTP/1, 2, 3, TLS/1.3, QUIC implementation ▪ supports HTTP routing using mruby (Rack) • as a hobby programmer: ◦ rat (ruby-based IPv4 NAT) • as a co-author of RFCs: ◦ RFC 8297 (HTTP 103 Early Hints) ◦ RFC 9218 (HTTP Extensible Priorities) ◦ RFC 9849 (TLS Encrypted Client Hello) Who am I 2

• Rapid Start ◦ Fastly’s new startup algorithm of its
congestion control • jrf - our ruby-based tool for log analysis • Visualization of network-related performance tests Topics 3

Rapid Start

• TCP+TLS/1.2: ◦ full handshake: 3 RT ◦ resumption: 2
RT • TCP+TLS/1.3: ◦ full handshake: 2 RT ◦ resumption: 1 RT • QUIC: ◦ full handshake: 1 RT ◦ resumption: 0 RT ※RT = number of round-trips Time to establish a connection 5

• TCP+TLS/1.2: ◦ full handshake: 3 RT ◦ resumption: 2
RT • TCP+TLS/1.3: ◦ full handshake: 2 RT ◦ resumption: 1 RT • QUIC: ◦ full handshake: 1 RT ◦ resumption: 0 RT ※RT = number of round-trips Time to establish a connection HTTP/3 6

• With HTTP/3, handshake latency is minimized: ◦ full handshake:
1 RT ◦ resumption: 0 RT Reducing the latency of HTTP 7

1 RT ◦ resumption: 0 RT • Time To First Byte (TTFB) is: ◦ full handshake: 2 RT ◦ resumption: 1 RT Reducing the latency of HTTP 8

1 RT ◦ resumption: 0 RT • Time To First Byte (TTFB) is: ◦ full handshake: 2 RT ◦ resumption: 1 RT • What about Time To Last Byte (TTLB)? ◦ TTLB is typically TTFB plus the speed of Slow Start Reducing the latency of HTTP 9

• Initial phase of congestion control: ◦ used when the
available bandwidth is unknown ◦ to quicly determine the available bandwidth Slow Start 10

• Initial phase of congestion control: ◦ used when the
available bandwidth is unknown ◦ to quicly determine the available bandwidth • Start by sending IW packets: ◦ IW = 10 (RFC), 30 (real-world) ◦ send 2x as more for each ack received Slow Start 11

• Initial phase of congestion control: ◦ used to quickly
fulﬁll the available bandwidth, unknown at the beginning of the connection • Starts by sending IW packets: ◦ IW = 10 (RFC), 30 (real-world) ◦ send 2x as more for each ack received • When packets are dropped (i.e., the network overﬂows), slow start enters “recovery” to repair lost packets, then congestion control switches to the second phase, known as congestion avoidance Slow Start 12

Slow Start and BDP 13

Slow Start and BDP 0 1 2 3 8 d
c b a f e 4 5 6 7 9 Idle BDP: number of packets needed to fully utilize the bottleneck link without building queue Queue builds up when packets arrive faster than the bottleneck link When the queue overﬂows, packets are dropped bottleneck link

• Idle BDP = 55Mb/s * 0.039s 0 1 2
3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP

• Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒
268KB ≒ 209 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP

• Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒
268KB ≒ 209 packets • With Slow Start: 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP

• Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒
268KB ≒ 209 packets • With Slow Start: ◦ 1RT: 30 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP

• Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒
268KB ≒ 209 packets • With Slow Start: ◦ 1RT: 30 packets ◦ 2RT: 60 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP

• Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒
268KB ≒ 209 packets • With Slow Start: ◦ 1RT: 30 packets ◦ 2RT: 60 packets ◦ 3RT: 120 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue Slow Start and BDP

• Idle BDP = 55Mb/s * 0.039s ≒ 2.15Mb ≒
268KB ≒ 209 packets • With Slow Start: ◦ 1RT: 30 packets ◦ 2RT: 60 packets ◦ 3RT: 120 packets ◦ 4RT: 240 packets 0 1 2 3 8 d c b a f e 4 5 6 7 9 Idle BDP queue bottleneck link is ﬁnally saturated Slow Start and BDP

Vertical axis: bytes sent / acked (cumulative) Horizontal axis: time
elapsed (milliseconds) Black dot: packet sent Yellow dot: ack received (reﬂects when the receiver received packets) This network on simulator 22

elapsed (milliseconds) Black dot: packet sent Yellow dot: ack received (reﬂects when the receiver received packets) This network on simulator 23

elapsed (milliseconds) Black dot: packet sent Yellow dot: ack received (reﬂects when the receiver received packets) This network on simulator nothing is received, as the sender stops initial sending after 0.5 RTT 24

elapsed (milliseconds) Black dot: packet sent Yellow dot: ack received (reﬂects when the receiver received packets) This network on simulator nothing is received, as the sender stops initial sending after 0.5 RTT underutilization 25

Impact of queuing and drops (VDSL) 26

Impact of queuing and drops (VDSL) moments of idle 27

Impact of queuing and drops (VDSL) moments of idle queuing
due to bursty sending 28

due to bursty sending packet drops, and hence recoveries due to queue overﬂow 29

due to bursty sending packet drops, and hence recoveries due to queue overﬂow 2nd recovery happens almost immediately after the 1st 30

Impact of queuing and drops (VDSL) data cannot be used
until packet drops are repaired moments of idle queuing due to bursty sending packet drops, and hence recoveries due to queue overﬂow 2nd recovery happens almost immediately after the 1st 31

Impact of queuing and drops (VDSL) data cannot be used
until packet drops are repaired moments of idle queuing due to bursty sending packet drops, and hence recoveries due to queue overﬂow 2nd recovery happens almost immediately after the 1st excess draining followed by a burst 32

• Utilize the available bandwidth as soon as possible ◦
Initial window larger than 30 packets ◦ More aggressive growth than 2x per RTT Think of an ideal startup 33

Initial window larger than 30 packets ◦ More aggressive growth than 2x per RTT • Minimalze the negative impact of packet drops ◦ To avoid drops in short transmissions, delay the initial drop as late as possible ◦ Reduce the number of recovery events ◦ Reduce the number of packets dropped per each recovery Think of an ideal startup 34

Initial window larger than 30 packets ◦ More aggressive growth than 2x per RTT • Minimalze the negative impact of packet drops ◦ To avoid drops in short transmissions, delay the initial drop as late as possible ◦ Reduce the number of recovery events ◦ Reduce the number of packets dropped per each recovery • To mitigate the risk of overﬂowing a queue other than that immediately before the bottleneck, avoid bursty sending Think of an ideal startup 35

• Proposed by Fastly (1st Internet-Draft submitted in Nov 2025)
※CWND: estimate of the full BDP (idle BDP + queue capacity) Rapid Start Slow Start Rapid Start initial sending stops after 0.5 RTT stops after 1 RTT increase 2x per RTT 3x per RTT (switches to 2x when observing queue buildup) recovery CWND *= 0.5 determine CWND based on packet drop ratio 36

• Slow Start: ◦ Sends IW packets for 0.5 RTT
• Rapid Start: ◦ Sends 2x IW packets for 1 RTT Rapid Start: initial sending 37

• Slow Start: ◦ Sends IW packets for 0.5 RTT
• Rapid Start: ◦ Sends 2x IW packets for 1 RTT ◦ Risk: potential queue buildup and earlier packet drops ▪ But no more bursty than Slow Start, as the interval between each packet sent remains the same Rapid Start: initial sending 38

• Slow Start: 2x • Rapid Start: ◦ queue_buildup?note ?
3x : 2x Note: recommended threshold is: rtt_ﬂoor > min(rtt_min + 4ms, rtt_min * 1.1), where rtt_ﬂoor is the smallest RTT observed over the most recent 1RT Rapid Start: CWND increase 39

• Slow Start: 2x • Rapid Start: ◦ queue_buildup?note ?
3x : 2x ◦ rationale: queue buildup is an outcome of the sender sending faster than the bottleneck link ▪ slower increase delays the chance of packet drops Note: recommended threshold is: rtt_ﬂoor > min(rtt_min + 4ms, rtt_min * 1.1), where rtt_ﬂoor is the smallest RTT observed over the most recent 1RT Rapid Start: CWND increase 40

• Slow Start enters 2nd recovery almost immediately Slow Start’s
recovery problem 41

Recap: Impact of queuing and drops data cannot be used
until packet drops are repaired moments of idle queuing due to bursty sending packet drops, and hence recoveries due to queue overﬂow 2nd recovery happens almost immediately after the 1st excess draining followed by a burst 42

• Slow Start enters 2nd recovery almost immediately Slow Start’s
recovery problem 43

• Slow Start enters 2nd recovery almost immediately, because: Slow
Start’s recovery problem 44

• Slow Start enters 2nd recovery almost immediately, because: ◦
Packet drops are observed 1 RT after overﬂow (i.e., when CWND ~ full BDP) Slow Start’s recovery problem 45

Packet drops are observed 1 RT after overﬂow (i.e., when CWND ~ full BDP) ◦ As Slow Start increases CWND by 2x per RTT, CWND ~ 2x full_BDP when observing a drop Slow Start’s recovery problem 46

Packet drops are observed 1 RT after overﬂow (i.e., when CWND ~ full BDP) ◦ As Slow Start increases CWND by 2x per RTT, CWND ~ 2x full_BDP when observing a drop ◦ Reducing CWND to half yields the full BDP, and therefore congestion control immediatly fulﬁlls the bottleneck Slow Start’s recovery problem 47

Packet drops are observed 1 RT after overﬂow (i.e., when CWND ~ full BDP) ◦ As Slow Start increases CWND by 2x per RTT, CWND ~ 2x full_BDP when observing a drop ◦ Reducing CWND to half yields the full BDP, and therefore congestion control immediatly fulﬁlls the bottleneck • Reducing CWND to ¼ is not a good solution, because that would fully drain the queue, leading to underutilization of the bottleneck link Slow Start’s recovery problem 48

• For each ack or packet drop, gradually decrease CWND,
so that, at the CWND recovery_exit becomes: 0.5 * bytes_acked_in_recovery ◦ because bytes acked in 1 RT reﬂects the full BDP • Beneﬁts: ◦ Works regardless of the increase ratio ◦ As CWND is gradually reduced, transmission resumes before the queue is fully drained Rapid Start: recovery 49

• Upon entering recovery: cwnd *= 5/6 • For each
ACK: cwnd -= 1/3 * bytes_newly_acked • For each loss: cwnd -= 5/6 * bytes_newly_lost See draft-kazuho-ietf-rapid-start-02 to see how these constants are derived Rapid Start: recovery 50

Rapid Start: recovery 51

Rapid Start on Simulator (VDSL) 52

Rapid Start on Simulator (VDSL) no idle moments 53

Rapid Start on Simulator (VDSL) no idle moments queue buildup
54

Rapid Start on Simulator (VDSL) no idle moments queue buildup
packet drops 55

Rapid Start on Simulator (VDSL) enters recovery only once, but
takes longer to repair drops due to 3x overshoot no idle moments queue buildup packet drops 56

Rapid Start on Simulator (VDSL) enters recovery only once, but
takes longer to repair drops due to 3x overshoot no idle moments queue buildup packet drops lands at a the right queue depth 57

Evaluating in Production

• HTTP/3 connections divided into 4 groups • For connections
serving cached objects >= 200KB as the ﬁrst request, record transport-level statistics and TTLB, when all bytes for that cached objects are acked • for 1 week on 7 POPs across the globe: East / SE Asia, East / West Europe, Africa, North / South America Setup: divided into 4 groups initial sending increase recovery baseline (slow start) 30 pkts in 0.5 RTT 2x CWND *= 0.5 jumpstart 60 pkts in 1 RTT 2x CWND *= 0.5 rapid-wo-jump 30 pkts in 0.5 RTT 3x / 2x CWND reduced relative to loss ratio rapidstart 60 pkts 1 RTT 59

{"module":"h2o","type":"h3s_stream0_ttlb","tid":397502,"time":1773026516280,"conn_id":1907798,"method":"GET","content_length":226578,"ttlb":364,"num-pa ckets.received":24,"num-packets.decryption-failed":0,"num-packets.sent":191,"num-packets.lost":0,"num-packets.lost-time-threshold":0,"num-packets.ack-r eceived":191,"num-packets.late-acked":0,"num-packets.initial-received":2,"num-packets.zero-rtt-received":0,"num-packets.handshake-received":2,"num-pack ets.initial-sent":1,"num-packets.zero-rtt-sent":0,"num-packets.handshake-sent":4,"num-packets.received-out-of-order":0,"num-packets.received-ecn-ect0": 0,"num-packets.received-ecn-ect1":0,"num-packets.received-ecn-ce":0,"num-packets.acked-ecn-ect0":0,"num-packets.acked-ecn-ect1":0,"num-packets.acked-ec n-ce":0,"num-packets.sent-promoted-paths":0,"num-packets.ack-received-promoted-paths":0,"num-packets.max-delayed":0,"num-packets.delayed-used":0,"num-b ytes.received":4737,"num-bytes.sent":236943,"num-bytes.lost":0,"num-bytes.ack-received":236895,"num-bytes.stream-data-sent":231728,"num-bytes.stream-da ta-resent":226,"num-frames-received.padding":3259,"num-frames-received.ping":1,"num-frames-received.ack":19,"num-frames-received.reset_stream":0,"num-f rames-received.stop_sending":0,"num-frames-received.crypto":2,"num-frames-received.new_token":0,"num-frames-received.stream":2,"num-frames-received.max _data":0,"num-frames-received.max_stream_data":0,"num-frames-received.max_streams_bidi":0,"num-frames-received.max_streams_uni":0,"num-frames-received.
data_blocked":0,"num-frames-received.stream_data_blocked":0,"num-frames-received.streams_blocked":0,"num-frames-received.new_connection_id":0,"num-fram es-received.retire_connection_id":0,"num-frames-received.path_challenge":0,"num-frames-received.path_response":0,"num-frames-received.transport_close": 0,"num-frames-received.application_close":0,"num-frames-received.handshake_done":0,"num-frames-received.datagram":0,"num-frames-received.ack_frequency" :0,"num-frames-received.immediate_ack":0,"num-frames-sent.padding":0,"num-frames-sent.ping":1,"num-frames-sent.ack":3,"num-frames-sent.reset_stream":0, "num-frames-sent.stop_sending":0,"num-frames-sent.crypto":7,"num-frames-sent.new_token":2,"num-frames-sent.stream":188,"num-frames-sent.max_data":0,"nu m-frames-sent.max_stream_data":0,"num-frames-sent.max_streams_bidi":0,"num-frames-sent.max_streams_uni":0,"num-frames-sent.data_blocked":0,"num-frames- sent.stream_data_blocked":0,"num-frames-sent.streams_blocked":0,"num-frames-sent.new_connection_id":6,"num-frames-sent.retire_connection_id":0,"num-fra mes-sent.path_challenge":0,"num-frames-sent.path_response":0,"num-frames-sent.transport_close":0,"num-frames-sent.application_close":0,"num-frames-sent .handshake_done":1,"num-frames-sent.datagram":0,"num-frames-sent.ack_frequency":0,"num-frames-sent.immediate_ack":0,"num-paths.created":0,"num-paths.va lidated":0,"num-paths.validation-failed":0,"num-paths.migration-elicited":0,"num-paths.promoted":0,"num-paths.closed-no-dcid":0,"num-paths.ecn-validate d":0,"num-paths.ecn-failed":1,"num-ptos":1,"num-handshake-timeouts":0,"num-initial-handshake-exceeded":0,"num-jumpstart-applicable":1,"quic.jumpstart.a pplicable":1,"num-rapid-start":0,"num-paced":1,"num-respected-app-limited":0,"handshake-confirmed-msec":369,"jumpstart.prev-rate":0,"jumpstart.prev-rtt ":0,"jumpstart.new-rtt":106,"jumpstart.cwnd":0,"quic.jumpstart.time-to-idle":647,"token-sent.at":0,"token-sent.rate":579889,"token-sent.rtt":67,"rtt.mi nimum":66,"rtt.smoothed":81,"rtt.variance":19,"rtt.latest":75,"loss-thresholds.use-packet-based":1,"loss-thresholds.time-based-percentile":128,"cc.cwnd ":273280,"cc.ssthresh":4294967295,"cc.cwnd-initial":44160,"cc.cwnd-exiting-slow-start":0,"cc.exit-slow-start-at":9223372036854775807,"cc.cwnd-exiting-j umpstart":0,"cc.cwnd-minimum":4294967295,"cc.cwnd-maximum":273280,"cc.num-loss-episodes":0,"cc.num-ecn-loss-episodes":0,"delivery-rate.latest":210149," delivery-rate.smoothed":739154,"delivery-rate.stdev":1078961,"num-sentmap-packets-largest":89} Example: stats for 1 connection 60

• Size of the dataset in 1 experiment: ◦ LDJSON
of 20M lines; 80GB (3.7GB in .gz) Analyzing data 61

• Size of the dataset in 1 experiment: ◦ LDJSON
of 20M lines; 80GB (3.7GB in .gz) • Need to apply various ad-hoc queries: ◦ jq is the obvious choice, however… Analyzing data 62

• The grammar is not intuitive • Slow • Not
suited for processing huge LDJSON ◦ Example: | min buffers the entire input ◦ when log analysis is almost alywas a streaming, map-reduce-like operation of huge data Issues with jq 63

jq -s '{ "min": (map(."rtt.minimum") | min), "max": (map(."rtt.minimum") |
max), "avg": (map(."rtt.minimum") | add / length), }' min/max/avg over rtt.minimum 64

jq -s '{ "min": (map(."rtt.minimum") | min), "max": (map(."rtt.minimum") |
max), "avg": (map(."rtt.minimum") | add / length), }' min/max/avg over rtt.minimum -s buffers entire input; jq essentially stops working when the input is larger than RAM size 65

jq -n ' reduce inputs as $o ( {min: null,
max: null, sum: 0, n: 0}; ($o."rtt.minimum") as $x | { min: (if .min == null or $x < .min then $x else .min end), max: (if .max == null or $x > .max then $x else .max end), sum: (.sum + $x), n: (.n + 1), } ) | { min, max, avg: (.sum / .n), }' min/max/avg over rtt.minimum With -n, each JSON object is processed separately; but aggregation logic needs to be hand-written 66

• Streaming processing is easy to write • JSON parser
is fast • The script is JIT-compiled Writing ruby scripts instead 67

is fast • The script is JIT-compiled • However: ◦ It becomes too long as an one-liner ◦ Ends up as a script with many many options ▪ Hard to maintain Writing ruby scripts instead 68

is fast • The script is JIT-compiled • However: ◦ It becomes too long as an one-liner ◦ Ends up as a script with many many options ▪ Hard to maintain ◦ Letting AI write is an option, but how would you verify your ad-hoc query is converted to correct code? Writing ruby scripts instead 69

• more SQL-like grammar + ruby DSL Writing jq (improved)
in ruby 70

• more SQL-like grammar + ruby DSL • compile the
query language using eval ◦ let JIT optimize the runtime and the query altogether Writing jq (improved) in ruby 71

• more SQL-like grammar + ruby DSL • compile the
query language using eval ◦ let JIT optimize the runtime and the query altogether • Streaming processing of NDJSON Writing jq (improved) in ruby 72

jrf '{ "min" => min(_["rtt.minimum"]), "max" => max(_["rtt.minimum"]), "avg" =>
average(_["rtt.minimum"]), }' jrf 73

# Filter then extract jrf 'select(_["x"] > 10) >> _["foo"]'
# Aggregate jrf 'select(_["item"] == "Apple") >> sum(_["count"])' jrf 'percentile(_["ttlb"], 0.50)' # Group by key and aggregate jrf 'group_by(_["item"]) { |row| sum(row["count"] * row["price"]) }' jrf 74

• Syntax: stage connected using >> ◦ Each stage is
just a ruby block • Filter: ◦ select(expr) • Transform: ◦ _["foo"] • Aggregation: ◦ min(expr), max(expr), sum(expr), … ◦ reduce(initial) { any ruby code } jrf 75

class Stage def initialize(block, src : nil) ... @ctx =
Class.new(RowContext) do define_method(:__jrf_expr__, &block) end end end # instantiated as: Stage.new(eval("proc { #{stage[:src]} }", ...)) jrf - internals Each stage expression is converted to a method, and gets called 76

• In typical log processing: ◦ filtering and transformation happen
before aggregation ◦ logs are split into multiple files jrf -P 10 'filter >> transform >> reduce' Jrf - automatic paralellization 77

before aggregation ◦ logs are split into multiple files • Therefore, processing of each file can be parallelized for: ◦ filtering and transformations in stages upfront jrf -P 10 'filter >> transform >> reduce' jrf - automatic paralellization 78

before aggregation ◦ logs are split into multiple files • Therefore, processing of each file can be parallelized for: ◦ filtering and transformations in stages upfront ◦ certain aggregations (e.g., min, max, sum) ▪ each thread calculates its own, then the results are merged jrf -P 10 'filter >> transform >> reduce' jrf - automatic paralellization 79

• Internally, jrf does the following: 1. Dry-run the 1st
JSON object for each stage to ﬁnd the ﬁrst few stages that can be parallelized. 2. Calls fork(2) and spawns workers that process those stages in parallel. 3. Each worker emits its result as NDJSON to a pipe 4. The main process reads from the pipes and feed the input to the remaining stages. jrf - automatic paralellization 80

• min: ◦ jq -s 'map(."rtt.minimum") | min' ◦ jq
-n 'reduce inputs."rtt.minimum" as $x (null; if . == null or $x < . then $x else . end)' ◦ jrf 'min(_["rtt.minimum"])' ◦ jrf -P 10 'min(_["rtt.minimum"])' jrf - benchmark 81

jrf - benchmark 950MB (single ﬁle) 81.4GB (29 ﬁles) min(rtt.minim
um) TTLB percentile delta min(rtt.minim um) TTLB percentile delta jq -s out of memory jq -n jrf jrf -P 10 all units in seconds 82

• TTLB percentile delta: ◦ jq -n ' include "helpers";
0.1 as $step | reduce inputs as $row ( {"baseline": [], "jumpstart": [], "rapid-no-jump": [], "rapidstart": []}; if ($row | base_cond(200000; 400000)) then .[$row | group_name] += [$row.ttlb] else . End ) | with_entries(.value |= percentiles($step)) | .baseline as $baseline | with_entries(select(.key != "baseline")) | with_entries( .value |= [range(0; length) as $i | (.[$i] / $baseline[$i] - 1)] )' jrf - benchmark 83

• TTLB percentile delta: ◦ jrf 'select(base_cond(_, 200000, 400000)) >>
[group_name(_), _["ttlb"]] >> group_by(_[0]) { percentile(_[1], $perc ||= 0.05.step(0.95, 0.1)) } >> map_values{|arr| arr.zip(_["baseline"]).map {|v,bv| v.to_f / bv - 1 } } >> _.reject{|k| k == "baseline"}' ◦ jrf -P 10 '...(same as above)...' jrf - benchmark 84

um) TTLB percentile delta min(rtt.minim um) TTLB percentile delta jq -s 7.93 8.52 out of memory jq -n 7.45 13.59 667.44 > 1800 jrf 2.29 2.39 226.80 240.92 jrf -P 10 2.27 2.38 31.41 31.69 all units in seconds 85

um) TTLB percentile delta min(rtt.minim um) TTLB percentile delta jq -s 7.93 8.52 out of memory jq -n 7.45 13.59 667.44 > 1800 jrf 2.29 2.39 226.80 240.92 jrf -P 10 2.27 2.38 31.41 31.69 3.3x 3.6x 21x > 50x all units in seconds 86

all units in seconds jrf - benchmark 950MB (single ﬁle)
81.4GB (29 ﬁles) min(rtt.minim um) TTLB percentile delta min(rtt.minim um) TTLB percentile delta jq -s 7.93 8.52 out of memory jq -n 7.45 13.59 667.44 > 1800 jrf 2.29 2.39 226.80 240.92 jrf -P 10 2.27 2.38 31.41 31.69 3.3x 3.6x 21x > 50x 2.6GB/s 87

• Written 99.9% by Codex and Claude ◦ Required thorough
human design review; otherwise, AI often broke the design structure that warrants efficiency • Productivity and correctness improved thanks to: ◦ AI generating the engine (jrf) and its test suite ◦ Humans and AI writing jrf queries in the DSL, which are declarative, concise, easier to understand and maintain jrf - use of AI 88

• Now that we have the tool, how do we
present the TTLBs as charts? ◦ next slide shows an example (of mine) from IETF 121 Visualization of A/B tests 89

present the TTLBs as charts? ◦ apparently, 2D charts using percentiles / TTLB do not work Visualization of A/B tests 91

present the TTLBs as charts? ◦ apparently, 2D charts using percentiles / TTLB do not work • The answer is to use: ◦ vertical axis: percentiles ◦ Horizontal axis: percentage delta of TTLB Visualization of A/B tests 92

• All POPs • All objects ≥ 200KB • With
Rapid Start, TTLB is reduced by 14.7% Note: thawtooth at the lower percentiles are due to the clock granularity being 1ms TTLB Reduction: Global 93

94 TTLB Reduction: per-POP • TTLB reduction: 10.8% ~ 21.5%

95 • Global data for different size bins: 200KB -
400KB / 400KB - 800KB / 800KB - 1.6MB / 1.6MB - 3.2MB • TTLB reduction: 10.6% (1.6MB - 3.2MB) ~ 14.9% (200KB - 400KB) TTLB Reduction: by Object Size Bin

Packet Loss Ratio: Global 96 slow start (baseline) jumpstart rapid-
no-jump rapidstart avg. 1.52% 1.61% 1.92% 1.98% P50 0.62% 0.62% 0.90% 0.85% P90 4.36% 4.57% 4.99% 5.06% P99 13.80% 14.22% 14.97% 15.55%

97 Packet Loss Ratio: per-POP POP with largest P99 PLR:
• slow start: 19.60% • jumpstart: 20.05% • rapid-no-jump: 20.99% • rapid start: 21.65%

98 TTLB Reduction: per-POP • TTLB reduction: 10.8% ~ 21.5%
But why is the shape different for North America? To ﬁnd an answer, you’d chat with AI and run tens of queries: such iteration is only possible with jrf.

Wrap up

• To analyze logs, it is paramount to have an
inituitive query DSL that runs fast: ◦ easy to run ad-hoc queries ◦ no need to setup & maintain query infrastructure jrf for fast log analysis 100

• To analyze logs, it is paramount to have an
inituitive query DSL that runs fast: ◦ easy to run ad-hoc queries ◦ no need to setup & maintain query infrastructure • jrf is an NDJSON query program ◦ with a DSL based on and extensible using ruby ◦ runs as 20x faster than jq ▪ 2.6GB/sec on a 10 core CPU jrf for fast log analysis 101

• Ruby is a powerful tool for writing DSL executors:
◦ the syntax is DSL friendly ◦ the entire workﬂow can be JIT-compiled ◦ has highly optimized libraries (e.g., JSON) Ruby for optimized tooling 102

• Ruby is a powerful tool for writing DSL executors:
◦ the syntax is DSL friendly ◦ the entire workﬂow can be JIT-compiled ◦ has highly optimized libraries (e.g., JSON) • AI has made it much easier to build well-tested DSL executors. Relying on them lets us work at a higher level, improving productivity without having to trust untested AI-written code to do the right thing. Ruby for optimized tooling 103

• To visualize network-related performance tests, consider using 2D charts
that: ◦ for the vertical axis, uses percentile ◦ for the horizontal axis, uses delta % from baseline Visualizing network perf tests 104

• TLS/1.3 and QUIC reduced handshake latency • Next step
is reducing TTLB: ◦ Rapid Start replaces Slow Start, and reduces TTLB by 14.7% globally (>=200KB objects) ◦ Ruby was an essential tool for developing Rapid Start Rapid Start 105

• TLS/1.3 and QUIC reduced handshake latency • Next step
is reducing TTLB: ◦ Rapid Start replaces Slow Start, and reduces TTLB by 14.7% globally (>=200KB objects) ◦ Ruby was an essential tool for developing Rapid Start Rapid Start Ruby is making the Web faster! 106

Rapid Start: Faster Internet Connections, with ...

Rapid Start: Faster Internet Connections, with Ruby's Help

More Decks by kazuho

Other Decks in Technology

Featured

Transcript