Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hannes Frederic Sowa on "BBR: Congestion-Based Congestion Control"

Hannes Frederic Sowa on "BBR: Congestion-Based Congestion Control"

TCP congestion control has a large impact on perceived network performance (especially in terms of bandwidth and latency) and thus the Internet. Two major categories of congestion control algorithms had been explored, those using packet loss or packet delay feedback. Due to historic developments (and the development of packet switching hardware), packet-loss congestion control algorithms are commonly used today. We will discuss a congestion control scheme published by Google in 2017.

Papers_We_Love

January 31, 2018
Tweet

More Decks by Papers_We_Love

Other Decks in Technology

Transcript

  1. BBR: Congestion-Based Congestion Control
    Review of the paper
    ”Bbr: Congestion-Based Congestion Control”
    by
    Hannes Frederic Sowa

    Papers We Love – NYC 2017

    View full-size slide

  2. Outline

    Motivation

    Context
    – Historical review of congestion control
    – Overview of in-use congestion control algorithms

    The actual paper at hand

    Some command line commands to feel the paper

    Conclusion and outlook

    View full-size slide

  3. Motivation

    For networking people maybe one of the more exciting papers
    this decade
    – TCP is one of the most used protocols
    – Even very small improvements to the protocol pay out a lot
    – Especially if they are noticeable by a lot of people

    No advancements in the state of the art of congestion control

    Thus no new approaches deployed since a while

    View full-size slide

  4. Congestive Collapse

    Happened 1986 in the NSFNet
    – Backbone speed dropped from 32kbit/s to 40bit/s

    Poor retransmission behavior of early implementations
    – Network was stuffed with retransmits

    First implementation of congestion control
    – Designed and implemented by Van Jacobson
    – Deployed up until 1988

    View full-size slide

  5. Early congestion control

    Congestion Avoidance and Control – Van Jacobson and
    Michael Karels 1988

    Congestion detection based on packet loss
    – Sender would slow down sending rate due to detection of packet loss
    – Detection loss happens based on Retransmit Timeout (RTO) or
    duplicated ACK packets

    Based on AIMD:
    additive increase multiplicative decrease

    View full-size slide

  6. Buffers vs. congestion control

    Visible to users: buffers increase latency of networking operations if filled
    – and don’t drop packets → invisible to loss based congestion control

    Huge effect on loss based congestion control
    – If buffers are large, available bandwidth becomes hard to discover for loss
    based congestion control

    Tends to keep the buffers filled up and thus creates bufferbloat

    Dramatically increases latency
    – If buffers are small and packets get dropped early, data streams slow down

    It could just have been a short burst long lasting congestion

    View full-size slide

  7. Current congestion control algorithms

    CUBIC
    – Linux, Mac OS and (soon) Windows default congestion algorithm

    Loss based

    Cubic function

    Compound-TCP
    – Current Windows congestion control algorithm

    Hybrid (TCP-Reno + delay based CC)

    LEDBAT
    – Apple and Bittorrent
    – Delay / ticked based

    View full-size slide

  8. Latency based congestion control

    Alternative to loss based congestion control

    Observes variation of round trip time and estimates congestion

    Unfortunately: will always be defeated by loss based congestion
    control
    – Thus nearly not used
    – (again, Microsoft uses a hybrid loss/latency based congestion control
    depending on RTT as well as does Apple for Updates)

    View full-size slide

  9. BBR: Congestion-Based Congestion Control

    The paper at hand

    Bottleneck-Bandwidth-and-Round-trip propagation time

    Developed by Neal Cardwell, Yuchung Cheng, C. Stephen
    Gunn, Soheil Hassas Yeganeh, Van Jacobson

    Developed and upstreamed into the Linux kernel with additional
    help of Eric Dumazet, Nandita Dukkipati by end of 2016
    – With further enhancements along the way

    View full-size slide

  10. Analogy physical pipe
    BtlBw
    RtProp
    Intermediate devices hidden to TCP – single pipe analogy.
    Slowest link determines overall throughput:
    RtProp: round-trip propagation time
    BtlBw: bottleneck bandwidth
    especially the minimal diameter of the pipe
    inflight = BtlBw · RtProp (Bandwidth-delay product: bits/s * s = bits – maximum amount of data in the
    network)
    Queue forms at device with
    slowest link

    View full-size slide

  11. Source: https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-based-congestion-control/fulltext

    View full-size slide

  12. Filling up the pipe, just not overwhelming it

    Optimal operating point
    bottleneck packet arrival rate == BtlBw (rate balance equation)
    and
    total data in flight == BtlBw · Rtprop (full pipe equation)

    Thus simply measure both and send data accordingly

    But not so fast...

    View full-size slide

  13. Naive approach to measure Rtprop and BtlBw
    ^
    RTprop=RTprop+min(ηt
    )=min(RTT
    t
    )∀ t∈[T−W
    R
    ,T ]
    ^
    BtlBw=max(deliveryRate
    t
    )∀t∈[T−W
    B
    ,T ]
    deliveryRate=ΔdeliveryRate/Δt
    RTprop approximation
    BtlBw approximation:
    where
    ΔdeliveryRate can be inferred by receiving ACKs from the sender side.
    They announce that data has left the pipe. For Δt the stack needs to keep track.

    View full-size slide

  14. The problem with estimating RTprop and BtlBw

    Sending at steady state at optimal point doesn’t form queues
    but also doesn’t eliminate them

    Routing changes packet travel paths, thus also changes
    RTprop or BtlBw

    Furthermore: If RTprop can be observed then BtlBw cannot

    and vice versa

    View full-size slide

  15. When an ACK packet is received

    ACK arrival provides RTT and delivery rate estimates

    Always update RTT estimates

    Update deliveryRate estimates:
    – If ( max_BtlBw < deliveryRate) then Update BtlBw max estimates
    – If ( packet not app-limited) then Update BtlBw max estimates

    View full-size slide

  16. When data is sent

    Update packet state
    – Timestamp
    – Mark packets if application limited (not consider for BtlBw estimates)

    Packet pacing
    – adapt sending rate to BtlBw (smoothens bursts): pacing_rate

    Preferred way is installing fair queue scheduler on interface

    Since Linux v4.13 TCP internal pacinig available

    View full-size slide

  17. Steady-state behavior

    BBR is a sender-side only congestion control algorithm

    Takes BtlBw and RTprop as input and controls adaption and
    estimation of bottleneck constraints →control loop

    Probing phase:
    – cycle pacing_gain to probe for bandwidth and RTT

    Remember: pacing happens at bottleneck speed rate
    – Apply new measurements as soon as possible
    – Decrease gain (< 1) to eliminate possible build up queues

    View full-size slide

  18. Source: https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-based-congestion-control/fulltext

    View full-size slide

  19. Source: https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-based-congestion-control/fulltext

    View full-size slide

  20. Ramping up the connection without leaving
    queues behinds

    How to reach steady-state behavior and bandwidth probing?

    Startup State:
    – Binary search of BtlBw with pacing_gain of 2/ln2
    – Finds BtlBw in log
    2
    (BDP) RTT
    – But leaves 2 * BDP in the queues

    Thus, after start-up state, BBR enters drain state:
    – Uses gain of ln2/2 (inverse of above)
    – Empties queues until inflight drops to BPD

    BBR enters steady-state and begins probing

    Hopefully without queues!

    View full-size slide

  21. Source: https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-based-congestion-control/fulltext

    View full-size slide

  22. Sharing a pipe

    ProbeRTT state is entered when RTT filter expires
    – That is when RTprop hasn’t been updated by many seconds with a lower RTT

    Reduces inflight to 4 * maximum segment size for at least one round trip

    When streams go into ProbeRTT state, they lower RTT for all flows on the
    system
    – Thus the last timestamp when RTT was last updated is shared between all flows
    – They tend to go into ProbeRTT state at the same time
    – This repeats and repeats, bringing RTT measurements closer to its physical value

    BBR achieved synchronization between connections

    View full-size slide

  23. Source: https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-based-congestion-control/fulltext

    View full-size slide

  24. Status of Deployment

    Google widely deployed TCP-BBR inside their datacenters
    – Huge speed up especially for intercontinental links (high BDP)

    Google’s outbound facing servers started to use BBR
    – e.g. youtube paces packets now very regularly, no bursts visible
    anymore
    – Improves latency in a lot of networks, especially with huge buffers

    View full-size slide

  25. “Imperfections”

    BBR causes high packet loss with competing BBR streams and
    small buffers along the path
    – BBRv2 might actually react to discovered packet loss

    Ramp-Up phase sometimes gives some flows unfair
    advantages

    Issues with middle boxes
    – Stretched / delayed ACKs
    – Policing systems can trick BBR

    View full-size slide

  26. Practical uses

    Use a recent Linux kernel

    Optional but recommended: add pacer to the outgoing interface
    – # tc qdisc replace dev root fq
    – $ man tc-fq

    Enable the usage of bbr by default
    – # modprobe tcp_bbr
    – # sysctl -w net.ipv4.tcp_congestion_control=bbr

    View full-size slide

  27. Conclusion and future

    BBR is a new idea of how to do congestion control
    – Still in development (see recent updates from IETF100)
    – Tries to adapts sending speed to sweet spot

    Major new protocols on the horizon
    – TLSv1.3
    – HTTP/2

    QUIC as alternative to TCP?
    – QUIC inherits the same problems from TCP
    – Implemented in user space
    – Allows for more complicated algorithms (e.g. FPU, databases, A.I.)

    View full-size slide

  28. Thanks to

    The authors of this paper

    The netdev@ community and all the people trying to free the
    Internet of bufferbloat

    backtrace.io for inviting me to stay in New York for a month
    – (shameless plug: we are hiring)

    View full-size slide