deadpoll

command module

v0.1.2 Latest Latest Go to latest Published: Apr 7, 2026 License: MIT Imports: 17 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/MacksMind/deadpoll

Links

Open Source Insights

README ¶

deadpoll

A fast, concurrent link checker for web applications. Crawls a target site, follows internal links, and reports broken URLs as JSONL.

Built with colly for speed — HTTP-level crawling with no browser overhead.

Background

deadpoll replaces LinkChecker for our use case. LinkChecker threads HTTP retrieval but parses sequentially, which becomes the bottleneck when hitting local targets where network latency is negligible. deadpoll parallelizes both retrieval and parsing via colly's async mode and Go's concurrency model.

Configuration is modeled after LinkChecker's linkcheckerrc format. Only a subset of features are implemented — the ones needed for the current use case. The table below shows what maps over and what doesn't.

Feature coverage vs LinkChecker

LinkChecker feature	deadpoll	Notes
[checking] `threads`	`[checking] threads`	Same concept
[checking] `timeout`	`[checking] timeout`	Go duration string instead of integer seconds
[checking] `recursionlevel`	`[checking] max_depth`	0 = unlimited in both
[checking] `sslverify`	`[checking] ssl_verify`	Same concept
[filtering] `checkextern`	`[filtering] check_extern`	Same concept
[filtering] `ignore`	`[filtering] ignore`	Regex list, same concept
[filtering] `nofollow`	`[filtering] nofollow`	Regex list, same concept
[authentication] `entry`	`[cookies]` / `--cookie`	Simplified to raw cookie values instead of HTTP auth
[filtering] `ignore_unless`	`[filtering] ignore_unless`	deadpoll extension — see Regex notes
[checking] `useragent`	—	Not implemented
[checking] `maxrunseconds`	—	Not implemented
[checking] `robotstxt`	—	Not implemented
[checking] `cookiefile`	—	Not implemented
[authentication] `loginurl`	—	Not implemented
[output] section	—	deadpoll always outputs JSONL
[text/html/csv/xml/...]	—	Single output format (JSONL)
Plugin sections	—	Not implemented

Install

Requires Go 1.23+.

go install github.com/MacksMind/deadpoll@latest

Or build from source:

go build -o deadpoll .

Usage

deadpoll [flags] <target-url>

Flags

Flag	Default	Description
`--config`		Path to TOML config file
`--output`	`deadpoll-results.jsonl`	Output file (`-` for stdout)
`--cookie`		Cookie sent with every request (`name=value`)

Examples

# Basic crawl
deadpoll https://example.com

# With config and auth cookie
deadpoll --config mysite.toml --cookie "_session_id=abc123" https://example.com

# Output to stdout
deadpoll --output - https://example.com

Output

Errors are written as JSONL (one JSON object per line):

{"url":"https://example.com/missing","status":404,"error":"Not Found","parent":"https://example.com/page","timestamp":"2026-04-07T00:13:06-04:00"}

Field	Description
`url`	The broken URL
`status`	HTTP status code (0 if connection failed)
`error`	Description of the failure
`parent`	The page that linked to the broken URL
`timestamp`	When the error was recorded (RFC 3339)

A progress heartbeat prints to stderr every 5 seconds. Exit code is 0 for clean runs, 1 if any errors were found.

The JSONL format is a deliberate choice for automated workflows. The intended pattern is to run deadpoll, then have an AI agent read the output file and propose fixes. This is why there's a single output format rather than the HTML/CSV/text options that LinkChecker provides.

Configuration

See config-example.toml for a fully annotated config file with patterns for common scenarios (session-mutating endpoints, destructive GET actions, pagination, etc.).

`check_extern`

When false (default), external links are silently skipped — not checked, not reported. When true, external links are checked (an HTTP request is made to verify they respond) but not crawled — the crawler will not follow links found on external pages. As a safety measure, AllowedDomains is set at the colly level when check_extern is false, ensuring the crawler cannot escape the target domain even if the filtering logic has a bug.

Regex notes

All patterns (ignore, nofollow, ignore_unless) use Go's regexp package, which implements RE2 syntax. This means no lookaheads or lookbehinds — patterns like ^(?!.*/catalog/).*page= will fail with a parse error.

The ignore_unless rule exists specifically to solve this. Instead of a negative lookahead, express it as two conditions:

# "Ignore page= unless the URL also contains /catalog/"
[[filtering.ignore_unless]]
pattern = "page="
unless  = "/catalog/"

How it works

deadpoll is an HTTP-level crawler, not a browser. It fetches pages, parses the HTML for <a href="..."> links, and follows them. This makes it fast but means it:

Does find all links present in the HTML, regardless of CSS visibility (hidden navs, collapsed menus, etc.)
Does not execute JavaScript — links injected by JS at runtime will be missed
Does not check resources referenced in CSS (background-image: url(...), @font-face, etc.)

Known limitations

No JavaScript execution. Single-page apps or JS-rendered navigation won't be crawled. This is a deliberate trade-off for speed.
No CSS resource checking. URLs inside stylesheets (url() references for images, fonts, etc.) are not currently checked. A prior Python-based tool caught these; adding CSS url() extraction is planned.
No viewport/responsive awareness. Since deadpoll works at the HTTP/HTML level rather than rendering pages, viewport size is irrelevant — it sees the full DOM regardless of breakpoints.

License

MIT

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL