crawler

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 2, 2026 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package crawler provides website crawling with SPA detection.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	// Maximum crawl depth
	MaxDepth int

	// Maximum pages to crawl
	MaxPages int

	// Include patterns (glob)
	IncludePatterns []string

	// Exclude patterns (glob)
	ExcludePatterns []string

	// Wait for SPA hydration
	WaitForSPA bool

	// SPA framework indicators
	SPAIndicators []string

	// Respect robots.txt
	RespectRobots bool

	// Use sitemap.xml
	UseSitemap bool

	// Delay between requests
	Delay time.Duration

	// Request timeout
	Timeout time.Duration

	// Concurrency limit
	Concurrency int
}

Config configures the crawler.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns default crawler configuration.

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

Crawler crawls websites discovering pages for accessibility auditing.

func NewCrawler

func NewCrawler(vibe *vibium.Vibe, logger *slog.Logger, config Config) *Crawler

NewCrawler creates a new crawler.

func (*Crawler) Crawl

func (c *Crawler) Crawl(ctx context.Context, startURL string) (*Result, error)

Crawl crawls a website starting from the given URL.

func (*Crawler) GetSPAFramework

func (c *Crawler) GetSPAFramework(ctx context.Context) string

GetSPAFramework returns the detected SPA framework for a page.

func (*Crawler) IsSPAPage

func (c *Crawler) IsSPAPage(ctx context.Context) bool

IsSPAPage checks if the current page is an SPA.

type Page

type Page struct {
	URL            string        `json:"url"`
	Title          string        `json:"title"`
	Depth          int           `json:"depth"`
	DiscoveredFrom string        `json:"discoveredFrom"`
	IsSPA          bool          `json:"isSPA"`
	SPAFramework   string        `json:"spaFramework,omitempty"`
	LoadTime       time.Duration `json:"loadTime"`
	Links          []string      `json:"links"`
	Error          string        `json:"error,omitempty"`
}

Page represents a discovered page.

type Result

type Result struct {
	StartURL    string        `json:"startUrl"`
	Pages       []Page        `json:"pages"`
	TotalPages  int           `json:"totalPages"`
	Duration    time.Duration `json:"duration"`
	RobotsTxt   string        `json:"robotsTxt,omitempty"`
	SitemapURLs []string      `json:"sitemapUrls,omitempty"`
}

Result contains the crawl results.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL