Documentation
¶
Index ¶
- Variables
- func ApplyStealth(page *rod.Page) error
- func ClickRef(page *rod.Page, ref string, snapshot *PageSnapshot) error
- func CloseTab(browser *rod.Browser, index int) (proto.TargetTargetID, error)
- func DetectAgent() bool
- func DismissCookieBanner(page *rod.Page) bool
- func EvalJS(page *rod.Page, expr string, elementRef string, snapshot *PageSnapshot) (string, error)
- func FormatCollect(r *CollectResult) string
- func FormatDiff(d SnapshotDiff) string
- func FormatErrors(errors []ErrorEntry) string
- func FormatMultiCollect(r *MultiCollectResult) string
- func FormatPreview(r *PreviewResult) string
- func FormatPreviewProfile(r *PreviewResult, p RenderProfile) string
- func FormatText(result *ExtractionResult) string
- func FormatTextProfile(result *ExtractionResult, p RenderProfile) string
- func HoverRef(page *rod.Page, ref string, snapshot *PageSnapshot) error
- func NewLauncher(opts LauncherOpts) *launcher.Launcher
- func PressKey(page *rod.Page, key string, ref string, snapshot *PageSnapshot) error
- func ResolveRef(page *rod.Page, ref string, snapshot *PageSnapshot) (*rod.Element, error)
- func SelectOption(page *rod.Page, ref string, values []string, snapshot *PageSnapshot) error
- func SetViewport(page *rod.Page, width, height int) error
- func SwitchTab(browser *rod.Browser, index int) (*rod.Page, error)
- func TakeScreenshot(page *rod.Page, fullPage bool, elementRef string, quality int, ...) ([]byte, error)
- func TruncateURL(u string, maxLen int) string
- func TypeRef(page *rod.Page, ref string, text string, snapshot *PageSnapshot) error
- func ValidateExtractLevel(level ExtractLevel) error
- func WaitForBotChallenge(page *rod.Page, timeout time.Duration) bool
- func WaitForPage(page *rod.Page, waitStrategy string) error
- func WaitForSelector(page *rod.Page, selector string, timeoutSec int) error
- type Browser
- func (b *Browser) Close()
- func (b *Browser) Connected() bool
- func (b *Browser) CurrentTargetID() string
- func (b *Browser) DeleteSnapshot(targetID proto.TargetTargetID) error
- func (b *Browser) Page() (*rod.Page, error)
- func (b *Browser) RodBrowser() *rod.Browser
- func (b *Browser) SaveSnapshot(page *rod.Page, result *ExtractionResult) error
- func (b *Browser) SetCurrentPage(page *rod.Page) error
- func (b *Browser) Snapshot(page *rod.Page) *PageSnapshot
- type CollectResult
- type CollectedItem
- type DialogResult
- type DiffEntry
- type DiffNode
- type DiffStats
- type ErrorCollector
- type ErrorEntry
- type ExtractLevel
- type ExtractedNode
- type ExtractionResult
- type ExtractionStats
- type LauncherOpts
- type MultiCollectResult
- type NetworkEntry
- type PageInfo
- type PageSnapshot
- type PreviewResult
- type PreviewSummary
- type RefSnapshot
- type RenderProfile
- type SiteResult
- type SnapshotDiff
- type TabInfo
Constants ¶
This section is empty.
Variables ¶
var ErrStaleRef = errors.New("stale ref: snapshot is missing or no longer matches the page")
ErrStaleRef indicates that a ref no longer maps to a live element.
Functions ¶
func ApplyStealth ¶
ApplyStealth applies anti-detection patches to a page via CDP. Targets DataDome, Akamai, and similar bot-detection systems.
func ClickRef ¶
func ClickRef(page *rod.Page, ref string, snapshot *PageSnapshot) error
ClickRef clicks the element at the given ref.
func DetectAgent ¶ added in v0.2.0
func DetectAgent() bool
DetectAgent reports whether the current process runs inside an LLM agent. Detection order:
- GHOSTCHROME_PROFILE=agent|human (explicit override).
- GHOSTCHROME_AGENT=1|true (explicit opt-in).
- Known agent environment variables set by Claude Code, Cursor, Aider, Devin, Gemini CLI, and similar tools.
func DismissCookieBanner ¶
DismissCookieBanner attempts to find and click a cookie accept button. Returns true if a banner was found and dismissed.
func EvalJS ¶
EvalJS evaluates JavaScript on the page or in an element context. If elementRef is non-empty, the JS runs with `this` bound to that element.
func FormatCollect ¶ added in v0.2.0
func FormatCollect(r *CollectResult) string
FormatCollect renders a compact text table from collected items.
func FormatDiff ¶ added in v0.2.0
func FormatDiff(d SnapshotDiff) string
FormatDiff renders a SnapshotDiff as compact text.
func FormatErrors ¶
func FormatErrors(errors []ErrorEntry) string
FormatErrors formats errors as compact text lines.
func FormatMultiCollect ¶ added in v0.2.0
func FormatMultiCollect(r *MultiCollectResult) string
FormatMultiCollect renders a compact text report for multi-URL results.
func FormatPreview ¶
func FormatPreview(r *PreviewResult) string
FormatPreview renders a compact text report (human profile).
func FormatPreviewProfile ¶ added in v0.2.0
func FormatPreviewProfile(r *PreviewResult, p RenderProfile) string
FormatPreviewProfile renders the preview using the given profile. In agent mode, empty sections and zero-stat headers are dropped, failed requests are grouped by status code and the DOM dump uses one-letter role tags.
func FormatText ¶
func FormatText(result *ExtractionResult) string
FormatText renders the extraction result as compact text (human profile).
func FormatTextProfile ¶ added in v0.2.0
func FormatTextProfile(result *ExtractionResult, p RenderProfile) string
FormatTextProfile renders the extraction result using the given profile. The agent profile uses one-letter role tags, truncates long labels and shortens hrefs (see TruncateURL).
func HoverRef ¶
func HoverRef(page *rod.Page, ref string, snapshot *PageSnapshot) error
HoverRef hovers over the element at the given ref.
func NewLauncher ¶ added in v0.2.0
func NewLauncher(opts LauncherOpts) *launcher.Launcher
NewLauncher returns a configured launcher with the shared anti-detection flags used by both auto-launch (NewBrowser) and the `serve` command.
func PressKey ¶
PressKey sends a keyboard key press. If ref is non-empty, focuses the element first.
func ResolveRef ¶
ResolveRef finds an element by its ref (@1, @2, etc.) using a persisted snapshot.
func SelectOption ¶
SelectOption selects option(s) in a <select> element by visible text.
func SetViewport ¶
SetViewport overrides the page viewport dimensions.
func TakeScreenshot ¶
func TakeScreenshot(page *rod.Page, fullPage bool, elementRef string, quality int, snapshot *PageSnapshot) ([]byte, error)
TakeScreenshot captures the page or a specific element. If elementRef is non-empty, captures only that element. If fullPage is true, captures the full scrollable page. quality controls JPEG quality (1-100); PNG is used if quality <= 0.
func TruncateURL ¶ added in v0.2.0
TruncateURL strips common scheme/www prefixes and shortens the URL to maxLen. Used by formatted output from preview and collect.
func TypeRef ¶
TypeRef types text into the element at the given ref. Uses focus + select all + keyboard typing to work with React/Vue/Angular.
func ValidateExtractLevel ¶ added in v0.2.0
func ValidateExtractLevel(level ExtractLevel) error
ValidateExtractLevel ensures the extraction level is supported.
func WaitForBotChallenge ¶ added in v0.2.0
WaitForBotChallenge detects bot-challenge pages (DataDome, Cloudflare, etc.) and waits for the challenge JS to resolve and the page to reload. Returns true if a challenge was detected and resolved.
func WaitForPage ¶ added in v0.2.0
WaitForPage applies a supported page wait strategy.
Types ¶
type Browser ¶
type Browser struct {
// contains filtered or unexported fields
}
Browser wraps a Rod browser with connect/launch logic.
func NewBrowser ¶
NewBrowser creates a browser instance. If connectURL is set, connects to an existing Chrome via CDP. Otherwise, auto-launches a new Chrome process.
func (*Browser) Close ¶
func (b *Browser) Close()
Close cleans up the browser resources. External Chrome keeps running; the CLI process owns the websocket lifetime.
func (*Browser) Connected ¶
Connected returns true if connected to external Chrome (not launched by us).
func (*Browser) CurrentTargetID ¶ added in v0.2.0
CurrentTargetID returns the persisted current tab target, if any.
func (*Browser) DeleteSnapshot ¶ added in v0.2.0
func (b *Browser) DeleteSnapshot(targetID proto.TargetTargetID) error
DeleteSnapshot removes stored ref state for a closed page target.
func (*Browser) Page ¶
Page returns the active page or creates a new one. When connected to an existing Chrome, it prefers the persisted active tab.
func (*Browser) RodBrowser ¶ added in v0.2.0
RodBrowser returns the underlying rod.Browser for advanced operations.
func (*Browser) SaveSnapshot ¶ added in v0.2.0
func (b *Browser) SaveSnapshot(page *rod.Page, result *ExtractionResult) error
SaveSnapshot persists the latest ref snapshot for the page.
func (*Browser) SetCurrentPage ¶ added in v0.2.0
SetCurrentPage marks the provided page as the current tab for the session.
type CollectResult ¶ added in v0.2.0
type CollectResult struct {
PageURL string `json:"page_url"`
ItemCount int `json:"item_count"`
Items []CollectedItem `json:"items"`
}
CollectResult holds all collected items from a listing page.
func Collect ¶ added in v0.2.0
func Collect(page *rod.Page, limit int) (*CollectResult, error)
Collect auto-detects repeated listing cards on a page and extracts structured data. It finds elements containing price patterns (€, $, £), groups them by common ancestor, and extracts title, price, URL, and metadata from each card.
type CollectedItem ¶ added in v0.2.0
type CollectedItem struct {
Title string `json:"title"`
Price string `json:"price,omitempty"`
URL string `json:"url,omitempty"`
Fields map[string]string `json:"fields,omitempty"`
}
CollectedItem is a single listing item extracted from a page.
type DialogResult ¶ added in v0.2.0
type DialogResult struct {
Handled bool `json:"handled"`
Action string `json:"action"`
Type string `json:"type,omitempty"`
Message string `json:"message,omitempty"`
URL string `json:"url,omitempty"`
DefaultPrompt string `json:"default_prompt,omitempty"`
TimedOut bool `json:"timed_out,omitempty"`
}
DialogResult describes how a JS dialog handler completed.
func HandleNextDialog ¶
func HandleNextDialog(page *rod.Page, accept bool, promptText string, timeout time.Duration) (*DialogResult, error)
HandleNextDialog waits for the next JavaScript dialog and handles it. The timeout is propagated via context so wait() unblocks cleanly on timeout and no goroutine is leaked.
type DiffNode ¶ added in v0.2.0
type DiffNode struct {
Ref string `json:"ref"`
Role string `json:"role"`
Name string `json:"name,omitempty"`
Href string `json:"href,omitempty"`
Value string `json:"value,omitempty"`
}
DiffNode is the minimal payload we return for added nodes.
type DiffStats ¶ added in v0.2.0
type DiffStats struct {
AddedCount int `json:"added"`
RemovedCount int `json:"removed"`
ChangedCount int `json:"changed"`
KeptCount int `json:"kept"`
}
DiffStats summarises a diff for agent consumption.
type ErrorCollector ¶
type ErrorCollector struct {
// contains filtered or unexported fields
}
ErrorCollector collects console-side errors from a page via CDP events.
func NewErrorCollector ¶
func NewErrorCollector(page *rod.Page) *ErrorCollector
NewErrorCollector creates a collector and starts listening on the page. It hooks into RuntimeConsoleAPICalled and RuntimeExceptionThrown.
func (*ErrorCollector) Errors ¶
func (c *ErrorCollector) Errors() []ErrorEntry
Errors returns all collected errors (snapshot).
type ErrorEntry ¶
type ErrorEntry struct {
Type string `json:"type"` // "console" or "network"
Level string `json:"level"` // "error", "warning", "4xx", "5xx"
Message string `json:"message"` // error message or URL
Source string `json:"source"` // file:line for console, URL for network
Status int `json:"status,omitempty"` // HTTP status for network errors
Method string `json:"method,omitempty"` // HTTP method for network
TimeMs int64 `json:"time_ms"` // timestamp relative to collector start
}
ErrorEntry represents a single console or network error.
type ExtractLevel ¶
type ExtractLevel string
ExtractLevel controls how much of the accessibility tree is returned.
const ( LevelSkeleton ExtractLevel = "skeleton" LevelContent ExtractLevel = "content" LevelFull ExtractLevel = "full" )
type ExtractedNode ¶
type ExtractedNode struct {
Ref string `json:"ref,omitempty"`
Role string `json:"role"`
Name string `json:"name,omitempty"`
Value string `json:"value,omitempty"`
Level int `json:"level,omitempty"`
Href string `json:"href,omitempty"`
Type string `json:"type,omitempty"`
Checked *bool `json:"checked,omitempty"`
Disabled bool `json:"disabled,omitempty"`
BackendNodeID proto.DOMBackendNodeID `json:"-"`
Children []ExtractedNode `json:"children,omitempty"`
}
ExtractedNode represents a filtered accessibility node.
type ExtractionResult ¶
type ExtractionResult struct {
Nodes []ExtractedNode `json:"nodes"`
Refs map[string]ExtractedNode `json:"refs"`
Stats ExtractionStats `json:"stats"`
}
ExtractionResult holds the extraction output.
func Extract ¶
func Extract(page *rod.Page, level ExtractLevel, selector string) (*ExtractionResult, error)
Extract retrieves the accessibility tree from the page and filters it.
type ExtractionStats ¶
type ExtractionStats struct {
TotalNodes int `json:"total_nodes"`
FilteredNodes int `json:"filtered_nodes"`
InteractiveCount int `json:"interactive_count"`
}
ExtractionStats provides extraction metrics.
type LauncherOpts ¶ added in v0.2.0
LauncherOpts configures a stealth-flavored Chrome launcher.
type MultiCollectResult ¶ added in v0.2.0
type MultiCollectResult struct {
TotalItems int `json:"total_items"`
TotalTimeMs int64 `json:"total_time_ms"`
Sites []SiteResult `json:"sites"`
}
MultiCollectResult holds results from parallel multi-URL collection.
func MultiCollect ¶ added in v0.2.0
func MultiCollect(browser *rod.Browser, urls []string, limit int, stealth bool, maxParallel int) *MultiCollectResult
MultiCollect scrapes multiple URLs in parallel using separate browser tabs. Each URL gets its own tab, navigates, collects, and closes. maxParallel caps the number of concurrent tabs; <= 0 falls back to 5.
type NetworkEntry ¶
type NetworkEntry struct {
Method string `json:"method,omitempty"`
URL string `json:"url"`
Status int `json:"status"`
Size int `json:"size_bytes"`
TimeMs int64 `json:"time_ms"`
MimeType string `json:"mime_type,omitempty"`
Error string `json:"error,omitempty"`
}
NetworkEntry represents a captured network request.
type PageInfo ¶
type PageInfo struct {
URL string `json:"url"`
Title string `json:"title"`
Status int `json:"status"`
TimeMs int64 `json:"time_ms"`
}
PageInfo holds the result of a navigation.
type PageSnapshot ¶ added in v0.2.0
type PageSnapshot struct {
TargetID string `json:"target_id"`
URL string `json:"url,omitempty"`
Title string `json:"title,omitempty"`
Refs map[string]RefSnapshot `json:"refs,omitempty"`
}
PageSnapshot stores the last known interactive refs for a page target.
func BuildSnapshot ¶ added in v0.2.0
func BuildSnapshot(page *rod.Page, result *ExtractionResult) (*PageSnapshot, error)
BuildSnapshot creates an in-memory ref snapshot from an extraction result.
type PreviewResult ¶
type PreviewResult struct {
PageInfo *PageInfo `json:"page"`
Errors []ErrorEntry `json:"errors"`
Network []NetworkEntry `json:"network"`
DOM *ExtractionResult `json:"dom"`
Summary PreviewSummary `json:"summary"`
}
PreviewResult is the all-in-one dev report for a page.
type PreviewSummary ¶
type PreviewSummary struct {
TotalRequests int `json:"total_requests"`
FailedRequests int `json:"failed_requests"`
ErrorCount int `json:"error_count"`
WarningCount int `json:"warning_count"`
InteractiveCount int `json:"interactive_count"`
}
PreviewSummary provides quick stats.
type RefSnapshot ¶ added in v0.2.0
type RefSnapshot struct {
BackendNodeID proto.DOMBackendNodeID `json:"backend_node_id"`
Role string `json:"role,omitempty"`
Name string `json:"name,omitempty"`
}
RefSnapshot stores a stable backend node mapping for a single ref.
type RenderProfile ¶ added in v0.2.0
type RenderProfile struct {
// Agent is true when the output is being consumed by an LLM agent runner
// (Claude Code, Cursor, Aider, etc.) rather than a human terminal.
Agent bool
// Format is "text" or "json".
Format string
// MaxLabelLen truncates node names / values to this length in agent mode.
// 0 means no truncation.
MaxLabelLen int
// AbbrevRoles uses 1-2 character role abbreviations (b/a/t/c/s/r/m/x/h).
AbbrevRoles bool
// DropEmptyStats omits "[errors] 0 ..." / "[network] ... 0 failed" headers
// when counts are zero.
DropEmptyStats bool
}
RenderProfile controls how output is rendered for the calling environment. It is resolved once per CLI invocation (see ResolveProfile) and then threaded into formatters.
func ProfileAgent ¶ added in v0.2.0
func ProfileAgent(format string) RenderProfile
ProfileAgent returns the compact agent-optimised profile.
func ProfileHuman ¶ added in v0.2.0
func ProfileHuman(format string) RenderProfile
ProfileHuman returns the default human-friendly profile.
func ResolveProfile ¶ added in v0.2.0
func ResolveProfile(explicit, format string) RenderProfile
ResolveProfile picks a RenderProfile from an explicit flag ("auto", "human", "agent") with environment-variable fallback for "auto".
type SiteResult ¶ added in v0.2.0
type SiteResult struct {
URL string `json:"url"`
Items []CollectedItem `json:"items"`
Count int `json:"count"`
TimeMs int64 `json:"time_ms"`
Error string `json:"error,omitempty"`
}
SiteResult holds the collect result for a single URL in a multi-collect.
type SnapshotDiff ¶ added in v0.2.0
type SnapshotDiff struct {
Unchanged bool `json:"unchanged,omitempty"`
Added []DiffNode `json:"added,omitempty"`
Removed []string `json:"removed,omitempty"`
Changed map[string]DiffEntry `json:"changed,omitempty"`
Stats DiffStats `json:"stats"`
}
SnapshotDiff reports the changes between two ref maps of a page. All fields are optional in JSON output so an unchanged page serialises to `{"unchanged":true}`.
func DiffRefs ¶ added in v0.2.0
func DiffRefs(prev, curr map[string]RefSnapshot) SnapshotDiff
DiffRefs compares two ref maps (typically the persisted PageSnapshot.Refs). Refs are reassigned in document order by the extractor, so a key match indicates the same logical node slot. A role or name change on the same key counts as "changed"; disappearing or new keys count as removed/added.