checker

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 1, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package checker verifies if URLs are alive by making HTTP requests. It uses a worker pool pattern for bounded concurrency and includes retry logic with exponential backoff for transient failures.

Index

Constants

View Source
const BrowserUserAgent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " +
	"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

BrowserUserAgent is a realistic browser User-Agent for bypassing bot detection.

Variables

This section is empty.

Functions

This section is empty.

Types

type Checker

type Checker struct {
	// contains filtered or unexported fields
}

Checker performs concurrent link checking with configurable options.

func New

func New(opts Options) *Checker

New creates a new Checker with the given options.

func (*Checker) Check

func (c *Checker) Check(ctx context.Context, links []Link) <-chan Result

Check checks links concurrently using a worker pool and streams results. URLs are deduplicated - each unique URL is checked once, with duplicate occurrences reported as StatusDuplicate. The returned channel will be closed when all links have been checked. Use the context to cancel ongoing checks.

func (*Checker) CheckAll

func (c *Checker) CheckAll(links []Link) []Result

CheckAll checks all links and returns results after all are complete. This is a blocking operation.

type Link struct {
	URL      string // The URL to check
	FilePath string // Source file where the link was found
	Line     int    // Line number in the source file (0 if unknown)
	Text     string // Link text (e.g., "Click here") for display purposes
}

Link represents a URL to be checked. This is decoupled from parser.Link to keep the checker package independent.

type LinkStatus

type LinkStatus int

LinkStatus represents the category of a checked link.

const (
	// StatusAlive indicates the link returned a 2xx response.
	StatusAlive LinkStatus = iota
	// StatusRedirect indicates the link redirected but the final destination is alive.
	StatusRedirect
	// StatusBlocked indicates the server returned 403 (possible bot detection).
	StatusBlocked
	// StatusDead indicates the link is broken (4xx except 403, 5xx, or redirect to dead).
	StatusDead
	// StatusError indicates a network error (timeout, DNS failure, connection refused).
	StatusError
	// StatusDuplicate indicates this link was already checked (references primary result).
	StatusDuplicate
)

func (LinkStatus) Description

func (s LinkStatus) Description() string

Description returns a human-readable explanation of the status.

func (LinkStatus) Label

func (s LinkStatus) Label() string

Label returns a short label for display (e.g., in badges).

func (LinkStatus) String

func (s LinkStatus) String() string

String returns the string representation of the status.

type Options

type Options struct {
	// Concurrency is the number of concurrent workers checking links.
	// Higher values = faster checking but more resource usage.
	// Default: 10
	Concurrency int

	// Timeout is the maximum time to wait for a single HTTP request.
	// This includes connection, TLS handshake, and response headers.
	// Default: 10s
	Timeout time.Duration

	// MaxRetries is the number of times to retry a failed request.
	// Only transient errors (timeouts, 5xx, 429) are retried.
	// Default: 2
	MaxRetries int

	// MaxRedirects is the maximum number of redirects to follow.
	// Default: 10
	MaxRedirects int

	// UserAgent is the User-Agent header sent with requests.
	// Some servers block requests without a proper User-Agent.
	// Default: "gone-link-checker/1.0"
	UserAgent string
}

Options configures the behavior of the link checker.

func DefaultOptions

func DefaultOptions() Options

DefaultOptions returns sensible default configuration.

func (Options) WithConcurrency

func (o Options) WithConcurrency(n int) Options

WithConcurrency sets the number of concurrent workers.

func (Options) WithMaxRedirects

func (o Options) WithMaxRedirects(n int) Options

WithMaxRedirects sets the maximum number of redirects to follow.

func (Options) WithMaxRetries

func (o Options) WithMaxRetries(n int) Options

WithMaxRetries sets the maximum retry count.

func (Options) WithTimeout

func (o Options) WithTimeout(d time.Duration) Options

WithTimeout sets the request timeout.

func (Options) WithUserAgent

func (o Options) WithUserAgent(ua string) Options

WithUserAgent sets the User-Agent header.

type Redirect

type Redirect struct {
	URL        string // The URL that redirected
	StatusCode int    // The redirect status code (301, 302, 307, 308)
}

Redirect represents a single hop in a redirect chain.

type Result

type Result struct {
	Link       Link       // The original link that was checked
	StatusCode int        // HTTP status code (0 if request failed)
	Status     LinkStatus // Computed status category
	Error      string     // Error message if applicable

	// Redirect info (populated when redirects occurred)
	RedirectChain []Redirect // Full chain of redirects
	FinalURL      string     // Final destination URL after following redirects
	FinalStatus   int        // Status code of final destination

	// Duplicate info (populated when Status == StatusDuplicate)
	DuplicateOf *Result // Points to primary result if this is a duplicate
}

Result represents the outcome of checking a single link.

func FilterAlive

func FilterAlive(results []Result) []Result

FilterAlive returns only the results where the link is alive.

func FilterByStatus

func FilterByStatus(results []Result, status LinkStatus) []Result

FilterByStatus returns results matching the given status.

func FilterDead

func FilterDead(results []Result) []Result

FilterDead returns results that are dead or errored.

func FilterDuplicates

func FilterDuplicates(results []Result) []Result

FilterDuplicates returns only duplicate results.

func FilterWarnings

func FilterWarnings(results []Result) []Result

FilterWarnings returns results with warning status (redirect or blocked).

func (Result) IsAlive

func (r Result) IsAlive() bool

IsAlive returns true if the link is considered alive (2xx response). Kept for backward compatibility.

func (Result) IsDead

func (r Result) IsDead() bool

IsDead returns true if the link is dead or errored.

func (Result) IsDuplicate

func (r Result) IsDuplicate() bool

IsDuplicate returns true if this is a duplicate of another checked link.

func (Result) IsWarning

func (r Result) IsWarning() bool

IsWarning returns true if the link has a warning status (redirect or blocked).

func (Result) StatusDisplay

func (r Result) StatusDisplay() string

StatusDisplay returns a formatted string for CLI display.

type Summary

type Summary struct {
	Total      int // Total links checked (including duplicates)
	UniqueURLs int // Number of unique URLs actually checked
	Alive      int // Links that are alive (2xx)
	Redirects  int // Links that redirect to working pages
	Blocked    int // Links blocked by 403
	Dead       int // Links that are dead (4xx/5xx)
	Errors     int // Links that failed with network errors
	Duplicates int // Duplicate occurrences
}

Summary provides statistics about check results.

func Summarize

func Summarize(results []Result) Summary

Summarize creates a summary from a slice of results.

func (s Summary) HasDeadLinks() bool

HasDeadLinks returns true if there are dead links or errors (exit code 1 condition).

func (Summary) HasIssues

func (s Summary) HasIssues() bool

HasIssues returns true if there are any warnings or dead links.

func (Summary) WarningsCount

func (s Summary) WarningsCount() int

WarningsCount returns total warnings (redirects + blocked).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL