checker

package
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 3, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package checker verifies if URLs are alive by making HTTP requests. It uses a worker pool pattern for bounded concurrency and includes retry logic with exponential backoff for transient failures.

Index

Constants

View Source
const (
	// DefaultConcurrency is the number of concurrent workers.
	// Higher values speed up checking but use more resources.
	DefaultConcurrency = 50

	// DefaultTimeout is the maximum time to wait for a single HTTP request.
	// Most healthy servers respond within 2-3 seconds.
	DefaultTimeout = 5 * time.Second

	// DefaultMaxRetries is the number of times to retry a failed request.
	// Lower values speed up checking but may miss transient failures.
	DefaultMaxRetries = 1

	// DefaultMaxRedirects is the maximum number of redirects to follow.
	// Most legitimate redirects are 1-2 hops.
	DefaultMaxRedirects = 5

	// DefaultUserAgent is the User-Agent header sent with requests.
	DefaultUserAgent = "gone-link-checker/1.0"
)

Default values for checker options. These are tuned for optimal performance while maintaining reliability.

View Source
const BrowserUserAgent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " +
	"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

BrowserUserAgent is a realistic browser User-Agent for bypassing bot detection.

Variables

This section is empty.

Functions

This section is empty.

Types

type Checker

type Checker struct {
	// contains filtered or unexported fields
}

Checker performs concurrent link checking with configurable options.

func New

func New(opts Options) *Checker

New creates a new Checker with the given options.

func (*Checker) Check

func (c *Checker) Check(ctx context.Context, links []Link) <-chan Result

Check checks links concurrently using a worker pool and streams results. URLs are deduplicated - each unique URL is checked once, with duplicate occurrences reported as StatusDuplicate. The returned channel will be closed when all links have been checked. Use the context to cancel ongoing checks.

func (*Checker) CheckAll

func (c *Checker) CheckAll(links []Link) []Result

CheckAll checks all links and returns results after all are complete. This is a blocking operation. Results slice is pre-allocated for efficiency.

type Link struct {
	URL      string // The URL to check
	FilePath string // Source file where the link was found
	Text     string // Link text (e.g., "Click here") for display purposes
	Line     int    // Line number in the source file (0 if unknown)
}

Link represents a URL to be checked. This is decoupled from parser.Link to keep the checker package independent.

type LinkStatus

type LinkStatus int

LinkStatus represents the category of a checked link.

const (
	// StatusAlive indicates the link returned a 2xx response.
	StatusAlive LinkStatus = iota
	// StatusRedirect indicates the link redirected but the final destination is alive.
	StatusRedirect
	// StatusBlocked indicates the server returned 403 (possible bot detection).
	StatusBlocked
	// StatusDead indicates the link is broken (4xx except 403, 5xx, or redirect to dead).
	StatusDead
	// StatusError indicates a network error (timeout, DNS failure, connection refused).
	StatusError
	// StatusDuplicate indicates this link was already checked (references primary result).
	StatusDuplicate
)

func (LinkStatus) Description

func (s LinkStatus) Description() string

Description returns a human-readable explanation of the status. Uses pre-defined strings to avoid allocations.

func (LinkStatus) Label

func (s LinkStatus) Label() string

Label returns a short label for display (e.g., in badges). Uses pre-defined strings to avoid allocations.

func (LinkStatus) String

func (s LinkStatus) String() string

String returns the string representation of the status. Uses pre-defined strings to avoid allocations.

type Options

type Options struct {
	// UserAgent is the User-Agent header sent with requests.
	// Some servers block requests without a proper User-Agent.
	UserAgent string

	// Concurrency is the number of concurrent workers checking links.
	// Higher values = faster checking but more resource usage.
	Concurrency int

	// Timeout is the maximum time to wait for a single HTTP request.
	// This includes connection, TLS handshake, and response headers.
	Timeout time.Duration

	// MaxRetries is the number of times to retry a failed request.
	// Only transient errors (timeouts, 5xx, 429) are retried.
	MaxRetries int

	// MaxRedirects is the maximum number of redirects to follow.
	MaxRedirects int
}

Options configures the behavior of the link checker.

func DefaultOptions

func DefaultOptions() Options

DefaultOptions returns optimized default configuration. These defaults prioritize speed while maintaining reasonable reliability.

func (Options) WithConcurrency

func (o Options) WithConcurrency(n int) Options

WithConcurrency sets the number of concurrent workers.

func (Options) WithMaxRedirects

func (o Options) WithMaxRedirects(n int) Options

WithMaxRedirects sets the maximum number of redirects to follow.

func (Options) WithMaxRetries

func (o Options) WithMaxRetries(n int) Options

WithMaxRetries sets the maximum retry count.

func (Options) WithTimeout

func (o Options) WithTimeout(d time.Duration) Options

WithTimeout sets the request timeout.

func (Options) WithUserAgent

func (o Options) WithUserAgent(ua string) Options

WithUserAgent sets the User-Agent header.

type Redirect

type Redirect struct {
	URL        string // The URL that redirected
	StatusCode int    // The redirect status code (301, 302, 307, 308)
}

Redirect represents a single hop in a redirect chain.

type Result

type Result struct {

	// Duplicate info (populated when Status == StatusDuplicate)
	DuplicateOf *Result // Points to primary result if this is a duplicate
	Link        Link    // The original link that was checked
	Error       string  // Error message if applicable

	FinalURL string // Final destination URL after following redirects

	// Redirect info (populated when redirects occurred)
	RedirectChain []Redirect // Full chain of redirects
	StatusCode    int        // HTTP status code (0 if request failed)
	Status        LinkStatus // Computed status category
	FinalStatus   int        // Status code of final destination

}

Result represents the outcome of checking a single link.

func FilterAlive

func FilterAlive(results []Result) []Result

FilterAlive returns only the results where the link is alive. Pre-allocates slice capacity - alive is typically the majority.

func FilterByStatus

func FilterByStatus(results []Result, status LinkStatus) []Result

FilterByStatus returns results matching the given status. Pre-allocates slice capacity based on expected ratio.

func FilterDead

func FilterDead(results []Result) []Result

FilterDead returns results that are dead or errored. Pre-allocates slice capacity based on expected ratio.

func FilterDuplicates

func FilterDuplicates(results []Result) []Result

FilterDuplicates returns only duplicate results. Pre-allocates slice capacity based on expected ratio.

func FilterWarnings

func FilterWarnings(results []Result) []Result

FilterWarnings returns results with warning status (redirect or blocked). Pre-allocates slice capacity based on expected ratio.

func (Result) IsAlive

func (r Result) IsAlive() bool

IsAlive returns true if the link is considered alive (2xx response). Kept for backward compatibility.

func (Result) IsDead

func (r Result) IsDead() bool

IsDead returns true if the link is dead or errored.

func (Result) IsDuplicate

func (r Result) IsDuplicate() bool

IsDuplicate returns true if this is a duplicate of another checked link.

func (Result) IsWarning

func (r Result) IsWarning() bool

IsWarning returns true if the link has a warning status (redirect or blocked).

func (Result) StatusDisplay

func (r Result) StatusDisplay() string

StatusDisplay returns a formatted string for CLI display.

type Summary

type Summary struct {
	Total      int // Total links checked (including duplicates)
	UniqueURLs int // Number of unique URLs actually checked
	Alive      int // Links that are alive (2xx)
	Redirects  int // Links that redirect to working pages
	Blocked    int // Links blocked by 403
	Dead       int // Links that are dead (4xx/5xx)
	Errors     int // Links that failed with network errors
	Duplicates int // Duplicate occurrences
}

Summary provides statistics about check results.

func Summarize

func Summarize(results []Result) Summary

Summarize creates a summary from a slice of results. Uses a single pass to count both unique URLs and status categories.

func (s Summary) HasDeadLinks() bool

HasDeadLinks returns true if there are dead links or errors (exit code 1 condition).

func (Summary) HasIssues

func (s Summary) HasIssues() bool

HasIssues returns true if there are any warnings or dead links.

func (Summary) WarningsCount

func (s Summary) WarningsCount() int

WarningsCount returns total warnings (redirects + blocked).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL