Good News (!!!) from the world of TCP Congestion Control

Google released a patch today that significantly improves how congestion control works in TCP.

First, the great part - the changes are on the sender side, and require no co-ordinated changes on the receiver, or the intermediary network.  Which is awesome - it means that this can be incrementally deployed purely by updating the network stack at the end-points.

Seriously, that is awesome news.  This field has been more art than science for the longest time, and despite the plethora of approaches out there, not much has really changed since the days of Reno.
Part of the reason for this is the network equivalent of the Heisenberg Uncertainty Principle - where bandwidth and network delay are inextricably linked, and can't be disambiguated.  And the problem with that is that it turns out that you really, really want to look at the two independently to find the optimal operating point for a network.

Anyhow, Google's new algorithm - called BBR, for Bottleneck, Bandwidth, and RTT - is already running internally at Google, and seems to be doing quite the spectacular job.  From the notes
BBR creates an explicit model of the network pipe by
sequentially probing the bottleneck bandwidth and RTT. On the arrival
of each ACK, BBR derives the current delivery rate of the last round
trip, and feeds it through a windowed max-filter to estimate the
bottleneck bandwidth. Conversely it uses a windowed min-filter to
estimate the round trip propagation delay. The max-filtered bandwidth
and min-filtered RTT estimates form BBR's model of the network pipe.

Using its model, BBR sets control parameters to govern sending
behavior. The primary control is the pacing rate: BBR applies a gain
multiplier to transmit faster or slower than the observed bottleneck
bandwidth. The conventional congestion window (cwnd) is now the
secondary control; the cwnd is set to a small multiple of the
estimated BDP (bandwidth-delay product) in order to allow full
utilization and bandwidth probing while bounding the potential amount
of queue at the bottleneck.
This seems pretty normal, but only from the "5km up" perspective.  The key here is how the RTT feeback is being incorporated/filtered to ramp up (and down) the bandwidth, and how this affect's BBR's view of the network pipe.  Looking at the actual process makes the most sense
When a BBR connection starts, it enters STARTUP mode and applies a
high gain to perform an exponential search to quickly probe the
bottleneck bandwidth (doubling its sending rate each round trip, like
slow start). However, instead of continuing until it fills up the
buffer (i.e. a loss), or until delay or ACK spacing reaches some
threshold (like Hystart), it uses its model of the pipe to estimate
when that pipe is full: it estimates the pipe is full when it notices
the estimated bandwidth has stopped growing. At that point it exits
STARTUP and enters DRAIN mode, where it reduces its pacing rate to
drain the queue it estimates it has created.

Then BBR enters steady state. In steady state, PROBE_BW mode cycles
between first pacing faster to probe for more bandwidth, then pacing
slower to drain any queue that created if no more bandwidth was
available, and then cruising at the estimated bandwidth to utilize the
pipe without creating excess queue. Occasionally, on an as-needed
basis, it sends significantly slower to probe for RTT (PROBE_RTT
Hat tip - +Michael Gebetsroither .
More information is available at the BBR-Dev group.

(Incidentally, Van Jacobsen has signed off on this...)

Post a Comment

Popular posts from this blog

Erlang, Binaries, and Garbage Collection (Sigh)