Documentation
¶
Overview ¶
Package robusthttp implements automatic retry logic for HTTP transfers that can recover from TCP streams that become "stuck" at extremely low throughput.
The Problem: TCP Streams Getting Stuck ¶
On multi-gigabit network connections, especially when multiple TCP flows compete for bandwidth, individual streams can become trapped in a state where they transmit only a few bytes per second despite having abundant link capacity available. This phenomenon occurs both on local networks and over the internet, with long-haul cross-continent routes being particularly susceptible.
Why TCP Streams Get Stuck ¶
TCP's congestion control algorithms (Cubic, BBR, Reno, etc.) attempt to find the available bandwidth by probing: they increase the sending rate until packet loss occurs, then back off. When multiple flows compete, several failure modes emerge:
Congestion Window Collapse: When packet loss occurs, TCP drastically reduces its congestion window (cwnd). On high-bandwidth, high-latency paths (large BDP), recovery can take minutes because the additive increase phase grows cwnd by only ~1 MSS per RTT. A flow that loses packets while competing with aggressive flows may never recover its fair share.
Bufferbloat-Induced Starvation: When network buffers are oversized, competing flows can fill queues with thousands of packets. A flow that experiences loss during this state backs off, while the competing flows continue to hold queue space. The backed-off flow's packets consistently arrive at the tail of deep queues, experiencing enormous RTTs (seconds) and further timeouts.
Retransmission Timeout (RTO) Spirals: After loss, if the retransmitted packets are also lost (common under congestion), RTO doubles exponentially (up to 60-120 seconds). The flow becomes effectively dormant while competing flows absorb the freed capacity. The standard RTO minimum of 200ms-1s further delays recovery.
Receive Window Limitations: On some paths, especially through middleboxes or with asymmetric routing, receive window updates can be delayed or lost, causing the sender to stall waiting for window space that the receiver has already advertised but the sender never received.
Path Asymmetry on Long Routes: Cross-continent connections often traverse different physical paths for forward and return traffic. ACK packets returning on congested reverse paths can be delayed or lost, starving the sender of acknowledgments and causing spurious retransmissions or window stalls.
Why Restarting Helps ¶
Establishing a new TCP connection provides a fresh start that bypasses many of these stuck states:
Fresh Congestion Window: New connections start with initial cwnd (typically 10 MSS) and can use slow-start to rapidly probe for available bandwidth, rather than being trapped in a collapsed cwnd with linear recovery.
Queue Position Reset: The new flow's packets enter queues without the accumulated RTT debt of the stuck flow. On paths with AQM (Active Queue Management like fq_codel), new flows get their own queue slot.
RTO Timer Reset: The new connection has a fresh RTO estimate based on current RTT measurements, rather than an exponentially backed-off timer.
Potential Path Diversity: ECMP (Equal-Cost Multi-Path) routing may assign the new flow to a different physical path, avoiding congested links entirely.
Updated Receive Window: The receiver re-advertises its current window, resolving any stale window state from the old connection.
How This Package Works ¶
RobustGet creates an HTTP reader that monitors transfer rate and automatically recovers from stuck connections:
Rate Monitoring: Each read operation is tracked. Periodically (every few seconds), the transfer rate is computed and compared against a minimum threshold that accounts for the number of concurrent transfers.
Adaptive Threshold: The minimum acceptable rate scales with the number of active transfers (using logarithmic scaling) and respects the expected link capacity. A single transfer must sustain higher throughput than when bandwidth is split among many.
Connection Restart: When rate drops below threshold, the current connection is closed and a new HTTP request is issued with a Range header starting at the last successfully received byte offset.
Retry Limits: To prevent infinite retry loops on truly failed servers, a maximum retry count (15 attempts) is enforced before returning an error.
This approach is particularly effective for large file transfers (sectors, pieces) where the cost of restarting a 32GiB transfer from 10% progress is acceptable compared to waiting potentially hours for a stuck connection to recover (if it ever does).
Index ¶
Constants ¶
This section is empty.
Variables ¶
var TotalTransferDivFactor int64 = 4
Functions ¶
func RobustGet ¶
func RobustGet(url string, headers http.Header, dataSize int64, rcf func() *RateCounter) io.ReadCloser
Types ¶
type RateCounter ¶
type RateCounter struct {
// contains filtered or unexported fields
}
func (*RateCounter) Check ¶
func (rc *RateCounter) Check(cb func() error) error
Check allows only single concurrent check per peer - this is to prevent multiple concurrent checks causing all transfers to fail at once. When we drop a peer, we'll reduce rc.transfers, so the next check will require less total bandwidth (assuming that MinAvgGlobalLogPeerRate is used).
func (*RateCounter) Release ¶
func (rc *RateCounter) Release()
type RateCounters ¶
type RateCounters[K comparable] struct { // contains filtered or unexported fields }
func NewRateCounters ¶
func NewRateCounters[K comparable](rateFunc RateFunc) *RateCounters[K]
func (*RateCounters[K]) Get ¶
func (rc *RateCounters[K]) Get(key K) *RateCounter
type RateEnforcingReader ¶
type RateEnforcingReader struct {
// contains filtered or unexported fields
}
func NewRateEnforcingReader ¶
func NewRateEnforcingReader(r io.Reader, rc *RateCounter, windowDuration time.Duration) *RateEnforcingReader
func (*RateEnforcingReader) Done ¶
func (rer *RateEnforcingReader) Done()
func (*RateEnforcingReader) ReadError ¶
func (rer *RateEnforcingReader) ReadError() error