Good News (!!!) from the world of TCP Congestion Control
Google released a patch today that significantly improves how congestion control works in TCP.
First, the great part - the changes are on the sender side, and require no co-ordinated changes on the receiver, or the intermediary network. Which is awesome - it means that this can be incrementally deployed purely by updating the network stack at the end-points.
Seriously, that is awesome news. This field has been more art than science for the longest time, and despite the plethora of approaches out there, not much has really changed since the days of Reno.
Part of the reason for this is the network equivalent of the Heisenberg Uncertainty Principle - where bandwidth and network delay are inextricably linked, and can't be disambiguated. And the problem with that is that it turns out that you really, really want to look at the two independently to find the optimal operating point for a network.
Anyhow, Google's new algorithm - called BBR, for Bottleneck, Bandwidth, and RTT - is already running internally at Google, and seems to be doing quite the spectacular job. From the notes
More information is available at the BBR-Dev group.
(Incidentally, Van Jacobsen has signed off on this...)
First, the great part - the changes are on the sender side, and require no co-ordinated changes on the receiver, or the intermediary network. Which is awesome - it means that this can be incrementally deployed purely by updating the network stack at the end-points.
Seriously, that is awesome news. This field has been more art than science for the longest time, and despite the plethora of approaches out there, not much has really changed since the days of Reno.
Part of the reason for this is the network equivalent of the Heisenberg Uncertainty Principle - where bandwidth and network delay are inextricably linked, and can't be disambiguated. And the problem with that is that it turns out that you really, really want to look at the two independently to find the optimal operating point for a network.
Anyhow, Google's new algorithm - called BBR, for Bottleneck, Bandwidth, and RTT - is already running internally at Google, and seems to be doing quite the spectacular job. From the notes
BBR creates an explicit model of the network pipe by sequentially probing the bottleneck bandwidth and RTT. On the arrival of each ACK, BBR derives the current delivery rate of the last round trip, and feeds it through a windowed max-filter to estimate the bottleneck bandwidth. Conversely it uses a windowed min-filter to estimate the round trip propagation delay. The max-filtered bandwidth and min-filtered RTT estimates form BBR's model of the network pipe. Using its model, BBR sets control parameters to govern sending behavior. The primary control is the pacing rate: BBR applies a gain multiplier to transmit faster or slower than the observed bottleneck bandwidth. The conventional congestion window (cwnd) is now the secondary control; the cwnd is set to a small multiple of the estimated BDP (bandwidth-delay product) in order to allow full utilization and bandwidth probing while bounding the potential amount of queue at the bottleneck.This seems pretty normal, but only from the "5km up" perspective. The key here is how the RTT feeback is being incorporated/filtered to ramp up (and down) the bandwidth, and how this affect's BBR's view of the network pipe. Looking at the actual process makes the most sense
When a BBR connection starts, it enters STARTUP mode and applies a high gain to perform an exponential search to quickly probe the bottleneck bandwidth (doubling its sending rate each round trip, like slow start). However, instead of continuing until it fills up the buffer (i.e. a loss), or until delay or ACK spacing reaches some threshold (like Hystart), it uses its model of the pipe to estimate when that pipe is full: it estimates the pipe is full when it notices the estimated bandwidth has stopped growing. At that point it exits STARTUP and enters DRAIN mode, where it reduces its pacing rate to drain the queue it estimates it has created. Then BBR enters steady state. In steady state, PROBE_BW mode cycles between first pacing faster to probe for more bandwidth, then pacing slower to drain any queue that created if no more bandwidth was available, and then cruising at the estimated bandwidth to utilize the pipe without creating excess queue. Occasionally, on an as-needed basis, it sends significantly slower to probe for RTT (PROBE_RTT mode).Hat tip - +Michael Gebetsroither .
More information is available at the BBR-Dev group.
(Incidentally, Van Jacobsen has signed off on this...)
Comments