robusthttp

package
v1.27.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2026 License: Apache-2.0, MIT Imports: 13 Imported by: 0

Documentation

Overview

Package robusthttp implements automatic retry logic for HTTP transfers that can recover from TCP streams that become "stuck" at extremely low throughput.

The Problem: TCP Streams Getting Stuck

On multi-gigabit network connections, especially when multiple TCP flows compete for bandwidth, individual streams can become trapped in a state where they transmit only a few bytes per second despite having abundant link capacity available. This phenomenon occurs both on local networks and over the internet, with long-haul cross-continent routes being particularly susceptible.

Why TCP Streams Get Stuck

TCP's congestion control algorithms (Cubic, BBR, Reno, etc.) attempt to find the available bandwidth by probing: they increase the sending rate until packet loss occurs, then back off. When multiple flows compete, several failure modes emerge:

  1. Congestion Window Collapse: When packet loss occurs, TCP drastically reduces its congestion window (cwnd). On high-bandwidth, high-latency paths (large BDP), recovery can take minutes because the additive increase phase grows cwnd by only ~1 MSS per RTT. A flow that loses packets while competing with aggressive flows may never recover its fair share.

  2. Bufferbloat-Induced Starvation: When network buffers are oversized, competing flows can fill queues with thousands of packets. A flow that experiences loss during this state backs off, while the competing flows continue to hold queue space. The backed-off flow's packets consistently arrive at the tail of deep queues, experiencing enormous RTTs (seconds) and further timeouts.

  3. Retransmission Timeout (RTO) Spirals: After loss, if the retransmitted packets are also lost (common under congestion), RTO doubles exponentially (up to 60-120 seconds). The flow becomes effectively dormant while competing flows absorb the freed capacity. The standard RTO minimum of 200ms-1s further delays recovery.

  4. Receive Window Limitations: On some paths, especially through middleboxes or with asymmetric routing, receive window updates can be delayed or lost, causing the sender to stall waiting for window space that the receiver has already advertised but the sender never received.

  5. Path Asymmetry on Long Routes: Cross-continent connections often traverse different physical paths for forward and return traffic. ACK packets returning on congested reverse paths can be delayed or lost, starving the sender of acknowledgments and causing spurious retransmissions or window stalls.

Why Restarting Helps

Establishing a new TCP connection provides a fresh start that bypasses many of these stuck states:

  • Fresh Congestion Window: New connections start with initial cwnd (typically 10 MSS) and can use slow-start to rapidly probe for available bandwidth, rather than being trapped in a collapsed cwnd with linear recovery.

  • Queue Position Reset: The new flow's packets enter queues without the accumulated RTT debt of the stuck flow. On paths with AQM (Active Queue Management like fq_codel), new flows get their own queue slot.

  • RTO Timer Reset: The new connection has a fresh RTO estimate based on current RTT measurements, rather than an exponentially backed-off timer.

  • Potential Path Diversity: ECMP (Equal-Cost Multi-Path) routing may assign the new flow to a different physical path, avoiding congested links entirely.

  • Updated Receive Window: The receiver re-advertises its current window, resolving any stale window state from the old connection.

How This Package Works

RobustGet creates an HTTP reader that monitors transfer rate and automatically recovers from stuck connections:

  1. Rate Monitoring: Each read operation is tracked. Periodically (every few seconds), the transfer rate is computed and compared against a minimum threshold that accounts for the number of concurrent transfers.

  2. Adaptive Threshold: The minimum acceptable rate scales with the number of active transfers (using logarithmic scaling) and respects the expected link capacity. A single transfer must sustain higher throughput than when bandwidth is split among many.

  3. Connection Restart: When rate drops below threshold, the current connection is closed and a new HTTP request is issued with a Range header starting at the last successfully received byte offset.

  4. Retry Limits: To prevent infinite retry loops on truly failed servers, a maximum retry count (15 attempts) is enforced before returning an error.

This approach is particularly effective for large file transfers (sectors, pieces) where the cost of restarting a 32GiB transfer from 10% progress is acceptable compared to waiting potentially hours for a stuck connection to recover (if it ever does).

Index

Constants

This section is empty.

Variables

View Source
var TotalTransferDivFactor int64 = 4

Functions

func RobustGet

func RobustGet(url string, headers http.Header, dataSize int64, rcf func() *RateCounter) io.ReadCloser

Types

type RateCounter

type RateCounter struct {
	// contains filtered or unexported fields
}

func (*RateCounter) Check

func (rc *RateCounter) Check(cb func() error) error

Check allows only single concurrent check per peer - this is to prevent multiple concurrent checks causing all transfers to fail at once. When we drop a peer, we'll reduce rc.transfers, so the next check will require less total bandwidth (assuming that MinAvgGlobalLogPeerRate is used).

func (*RateCounter) Release

func (rc *RateCounter) Release()

type RateCounters

type RateCounters[K comparable] struct {
	// contains filtered or unexported fields
}

func NewRateCounters

func NewRateCounters[K comparable](rateFunc RateFunc) *RateCounters[K]

func (*RateCounters[K]) Get

func (rc *RateCounters[K]) Get(key K) *RateCounter

type RateEnforcingReader

type RateEnforcingReader struct {
	// contains filtered or unexported fields
}

func NewRateEnforcingReader

func NewRateEnforcingReader(r io.Reader, rc *RateCounter, windowDuration time.Duration) *RateEnforcingReader

func (*RateEnforcingReader) Done

func (rer *RateEnforcingReader) Done()

func (*RateEnforcingReader) Read

func (rer *RateEnforcingReader) Read(p []byte) (int, error)

func (*RateEnforcingReader) ReadError

func (rer *RateEnforcingReader) ReadError() error

type RateFunc

type RateFunc func(transferRateMbps float64, peerTransfers, totalTransfers int64) error

func MinAvgGlobalLogPeerRate

func MinAvgGlobalLogPeerRate(minTxRateMbps, linkMbps float64) RateFunc

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL