Documentation
¶
Overview ¶
Package checker verifies if URLs are alive by making HTTP requests. It uses a worker pool pattern for bounded concurrency and includes retry logic with exponential backoff for transient failures.
Index ¶
Constants ¶
const ( // DefaultConcurrency is the number of concurrent workers. // Higher values speed up checking but use more resources. DefaultConcurrency = 50 // DefaultTimeout is the maximum time to wait for a single HTTP request. // Most healthy servers respond within 2-3 seconds. DefaultTimeout = 5 * time.Second // DefaultMaxRetries is the number of times to retry a failed request. // Lower values speed up checking but may miss transient failures. DefaultMaxRetries = 1 // DefaultMaxRedirects is the maximum number of redirects to follow. // Most legitimate redirects are 1-2 hops. DefaultMaxRedirects = 5 // DefaultUserAgent is the User-Agent header sent with requests. DefaultUserAgent = "gone-link-checker/1.0" )
Default values for checker options. These are tuned for optimal performance while maintaining reliability.
const BrowserUserAgent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " +
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
BrowserUserAgent is a realistic browser User-Agent for bypassing bot detection.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Checker ¶
type Checker struct {
// contains filtered or unexported fields
}
Checker performs concurrent link checking with configurable options.
func (*Checker) Check ¶
Check checks links concurrently using a worker pool and streams results. URLs are deduplicated - each unique URL is checked once, with duplicate occurrences reported as StatusDuplicate. The returned channel will be closed when all links have been checked. Use the context to cancel ongoing checks.
type Link ¶
type Link struct {
URL string // The URL to check
FilePath string // Source file where the link was found
Text string // Link text (e.g., "Click here") for display purposes
Line int // Line number in the source file (0 if unknown)
}
Link represents a URL to be checked. This is decoupled from parser.Link to keep the checker package independent.
type LinkStatus ¶
type LinkStatus int
LinkStatus represents the category of a checked link.
const ( // StatusAlive indicates the link returned a 2xx response. StatusAlive LinkStatus = iota // StatusRedirect indicates the link redirected but the final destination is alive. StatusRedirect // StatusBlocked indicates the server returned 403 (possible bot detection). StatusBlocked // StatusDead indicates the link is broken (4xx except 403, 5xx, or redirect to dead). StatusDead // StatusError indicates a network error (timeout, DNS failure, connection refused). StatusError // StatusDuplicate indicates this link was already checked (references primary result). StatusDuplicate )
func (LinkStatus) Description ¶
func (s LinkStatus) Description() string
Description returns a human-readable explanation of the status. Uses pre-defined strings to avoid allocations.
func (LinkStatus) Label ¶
func (s LinkStatus) Label() string
Label returns a short label for display (e.g., in badges). Uses pre-defined strings to avoid allocations.
func (LinkStatus) String ¶
func (s LinkStatus) String() string
String returns the string representation of the status. Uses pre-defined strings to avoid allocations.
type Options ¶
type Options struct {
// UserAgent is the User-Agent header sent with requests.
// Some servers block requests without a proper User-Agent.
UserAgent string
// Concurrency is the number of concurrent workers checking links.
// Higher values = faster checking but more resource usage.
Concurrency int
// Timeout is the maximum time to wait for a single HTTP request.
// This includes connection, TLS handshake, and response headers.
Timeout time.Duration
// MaxRetries is the number of times to retry a failed request.
// Only transient errors (timeouts, 5xx, 429) are retried.
MaxRetries int
// MaxRedirects is the maximum number of redirects to follow.
MaxRedirects int
}
Options configures the behavior of the link checker.
func DefaultOptions ¶
func DefaultOptions() Options
DefaultOptions returns optimized default configuration. These defaults prioritize speed while maintaining reasonable reliability.
func (Options) WithConcurrency ¶
WithConcurrency sets the number of concurrent workers.
func (Options) WithMaxRedirects ¶
WithMaxRedirects sets the maximum number of redirects to follow.
func (Options) WithMaxRetries ¶
WithMaxRetries sets the maximum retry count.
func (Options) WithTimeout ¶
WithTimeout sets the request timeout.
func (Options) WithUserAgent ¶
WithUserAgent sets the User-Agent header.
type Redirect ¶
type Redirect struct {
URL string // The URL that redirected
StatusCode int // The redirect status code (301, 302, 307, 308)
}
Redirect represents a single hop in a redirect chain.
type Result ¶
type Result struct {
// Duplicate info (populated when Status == StatusDuplicate)
DuplicateOf *Result // Points to primary result if this is a duplicate
Link Link // The original link that was checked
Error string // Error message if applicable
FinalURL string // Final destination URL after following redirects
// Redirect info (populated when redirects occurred)
RedirectChain []Redirect // Full chain of redirects
StatusCode int // HTTP status code (0 if request failed)
Status LinkStatus // Computed status category
FinalStatus int // Status code of final destination
}
Result represents the outcome of checking a single link.
func FilterAlive ¶
FilterAlive returns only the results where the link is alive. Pre-allocates slice capacity - alive is typically the majority.
func FilterByStatus ¶
func FilterByStatus(results []Result, status LinkStatus) []Result
FilterByStatus returns results matching the given status. Pre-allocates slice capacity based on expected ratio.
func FilterDead ¶
FilterDead returns results that are dead or errored. Pre-allocates slice capacity based on expected ratio.
func FilterDuplicates ¶
FilterDuplicates returns only duplicate results. Pre-allocates slice capacity based on expected ratio.
func FilterWarnings ¶
FilterWarnings returns results with warning status (redirect or blocked). Pre-allocates slice capacity based on expected ratio.
func (Result) IsAlive ¶
IsAlive returns true if the link is considered alive (2xx response). Kept for backward compatibility.
func (Result) IsDuplicate ¶
IsDuplicate returns true if this is a duplicate of another checked link.
func (Result) IsWarning ¶
IsWarning returns true if the link has a warning status (redirect or blocked).
func (Result) StatusDisplay ¶
StatusDisplay returns a formatted string for CLI display.
type Summary ¶
type Summary struct {
Total int // Total links checked (including duplicates)
UniqueURLs int // Number of unique URLs actually checked
Alive int // Links that are alive (2xx)
Redirects int // Links that redirect to working pages
Blocked int // Links blocked by 403
Dead int // Links that are dead (4xx/5xx)
Errors int // Links that failed with network errors
Duplicates int // Duplicate occurrences
}
Summary provides statistics about check results.
func Summarize ¶
Summarize creates a summary from a slice of results. Uses a single pass to count both unique URLs and status categories.
func (Summary) HasDeadLinks ¶
HasDeadLinks returns true if there are dead links or errors (exit code 1 condition).
func (Summary) WarningsCount ¶
WarningsCount returns total warnings (redirects + blocked).