Documentation
¶
Overview ¶
Package robusthttp implements automatic retry logic for HTTP transfers that can recover from TCP streams that become "stuck" at extremely low throughput.
The Problem: TCP Streams Getting Stuck ¶
On multi-gigabit network connections, especially when multiple TCP flows compete for bandwidth, individual streams can become trapped in a state where they transmit only a few bytes per second despite having abundant link capacity available. This phenomenon occurs both on local networks and over the internet, with long-haul cross-continent routes being particularly susceptible.
Why TCP Streams Get Stuck ¶
TCP's congestion control algorithms (Cubic, BBR, Reno, etc.) attempt to find the available bandwidth by probing: they increase the sending rate until packet loss occurs, then back off. When multiple flows compete, several failure modes emerge:
Congestion Window Collapse: When packet loss occurs, TCP drastically reduces its congestion window (cwnd). On high-bandwidth, high-latency paths (large BDP), recovery can take minutes because the additive increase phase grows cwnd by only ~1 MSS per RTT. A flow that loses packets while competing with aggressive flows may never recover its fair share.
Bufferbloat-Induced Starvation: When network buffers are oversized, competing flows can fill queues with thousands of packets. A flow that experiences loss during this state backs off, while the competing flows continue to hold queue space. The backed-off flow's packets consistently arrive at the tail of deep queues, experiencing enormous RTTs (seconds) and further timeouts.
Retransmission Timeout (RTO) Spirals: After loss, if the retransmitted packets are also lost (common under congestion), RTO doubles exponentially (up to 60-120 seconds). The flow becomes effectively dormant while competing flows absorb the freed capacity. The standard RTO minimum of 200ms-1s further delays recovery.
Receive Window Limitations: On some paths, especially through middleboxes or with asymmetric routing, receive window updates can be delayed or lost, causing the sender to stall waiting for window space that the receiver has already advertised but the sender never received.
Path Asymmetry on Long Routes: Cross-continent connections often traverse different physical paths for forward and return traffic. ACK packets returning on congested reverse paths can be delayed or lost, starving the sender of acknowledgments and causing spurious retransmissions or window stalls.
Why Restarting Helps ¶
Establishing a new TCP connection provides a fresh start that bypasses many of these stuck states:
Fresh Congestion Window: New connections start with initial cwnd (typically 10 MSS) and can use slow-start to rapidly probe for available bandwidth, rather than being trapped in a collapsed cwnd with linear recovery.
Queue Position Reset: The new flow's packets enter queues without the accumulated RTT debt of the stuck flow. On paths with AQM (Active Queue Management like fq_codel), new flows get their own queue slot.
RTO Timer Reset: The new connection has a fresh RTO estimate based on current RTT measurements, rather than an exponentially backed-off timer.
Potential Path Diversity: ECMP (Equal-Cost Multi-Path) routing may assign the new flow to a different physical path, avoiding congested links entirely.
Updated Receive Window: The receiver re-advertises its current window, resolving any stale window state from the old connection.
How This Package Works ¶
RobustGet creates an HTTP reader that monitors transfer rate and automatically recovers from stuck connections:
Rate Monitoring: Each read operation is tracked. Periodically (every few seconds), the transfer rate is computed and compared against a minimum threshold that accounts for the number of concurrent transfers.
Adaptive Threshold: The minimum acceptable rate scales with the number of active transfers (using logarithmic scaling) and respects the expected link capacity. A single transfer must sustain higher throughput than when bandwidth is split among many.
Connection Restart: When rate drops below threshold, the current connection is closed and a new HTTP request is issued with a Range header starting at the last successfully received byte offset.
Retry Limits: To prevent infinite retry loops on truly failed servers, a maximum retry count (15 attempts) is enforced before returning an error.
This approach is particularly effective for large file transfers (sectors, pieces) where the cost of restarting a 32GiB transfer from 10% progress is acceptable compared to waiting potentially hours for a stuck connection to recover (if it ever does).
Index ¶
- Variables
- func NewSSRFProtectedHTTPClient(policy *SSRFPolicy, originalHeaders http.Header) (*http.Client, func() net.Conn)
- func RobustGet(url string, headers http.Header, dataSize int64, rcf func() *RateCounter) io.ReadCloser
- func RobustGetWithSSRFPolicy(url string, headers http.Header, dataSize int64, rcf func() *RateCounter, ...) io.ReadCloser
- func SetGlobalSSRFOverride(p *SSRFPolicy)
- func ValidateClientFetchURL(raw string, headers http.Header, policy *SSRFPolicy) (*url.URL, error)
- type RateCounter
- type RateCounters
- type RateEnforcingReader
- type RateFunc
- type SSRFPolicy
Constants ¶
This section is empty.
Variables ¶
var TotalTransferDivFactor int64 = 4
Functions ¶
func NewSSRFProtectedHTTPClient ¶ added in v1.27.4
func NewSSRFProtectedHTTPClient(policy *SSRFPolicy, originalHeaders http.Header) (*http.Client, func() net.Conn)
NewSSRFProtectedHTTPClient is for the actual downloader.
func RobustGet ¶
func RobustGet(url string, headers http.Header, dataSize int64, rcf func() *RateCounter) io.ReadCloser
func RobustGetWithSSRFPolicy ¶ added in v1.27.4
func RobustGetWithSSRFPolicy(url string, headers http.Header, dataSize int64, rcf func() *RateCounter, policy *SSRFPolicy) io.ReadCloser
func SetGlobalSSRFOverride ¶ added in v1.28.0
func SetGlobalSSRFOverride(p *SSRFPolicy)
SetGlobalSSRFOverride installs a fallback SSRFPolicy that is used whenever a caller passes a nil policy. Call with nil to clear the override.
func ValidateClientFetchURL ¶ added in v1.27.4
ValidateClientFetchURL is for early deal/proposal validation.
Types ¶
type RateCounter ¶
type RateCounter struct {
// contains filtered or unexported fields
}
func (*RateCounter) Check ¶
func (rc *RateCounter) Check(cb func() error) error
Check allows only single concurrent check per peer - this is to prevent multiple concurrent checks causing all transfers to fail at once. When we drop a peer, we'll reduce rc.transfers, so the next check will require less total bandwidth (assuming that MinAvgGlobalLogPeerRate is used).
func (*RateCounter) Release ¶
func (rc *RateCounter) Release()
type RateCounters ¶
type RateCounters[K comparable] struct { // contains filtered or unexported fields }
func NewRateCounters ¶
func NewRateCounters[K comparable](rateFunc RateFunc) *RateCounters[K]
func (*RateCounters[K]) Get ¶
func (rc *RateCounters[K]) Get(key K) *RateCounter
type RateEnforcingReader ¶
type RateEnforcingReader struct {
// contains filtered or unexported fields
}
func NewRateEnforcingReader ¶
func NewRateEnforcingReader(r io.Reader, rc *RateCounter, windowDuration time.Duration) *RateEnforcingReader
func (*RateEnforcingReader) Done ¶
func (rer *RateEnforcingReader) Done()
func (*RateEnforcingReader) ReadError ¶
func (rer *RateEnforcingReader) ReadError() error
type RateFunc ¶
func MinAvgGlobalLogPeerRate ¶
type SSRFPolicy ¶ added in v1.27.4
type SSRFPolicy struct {
MaxURLLength int
MaxHeaderBytes int
MaxHeaderValues int
MaxRedirects int
DialTimeout time.Duration
TLSHandshakeTimeout time.Duration
ResponseHeaderTimeout time.Duration
AllowLoopbackIPs bool
AllowLocalHostnames bool
AllowPrivateIPs bool
// AllowedHosts is a list of host or host:port entries that bypass IP and
// hostname validation. Entries without a port match any port for that host.
AllowedHosts []string
// Disabled skips all SSRF host/IP/header validation. URL structure checks
// (scheme, control characters) are still enforced.
Disabled bool
}
SSRFPolicy is for controlling the behavior of the downloader to prevent Server-Side Request Forgery
func DefaultSSRFPolicy ¶ added in v1.27.4
func DefaultSSRFPolicy() SSRFPolicy