analysis

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 13, 2026 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Overview

Package analysis provides advanced content analysis engines for crawled pages.

Package analysis provides advanced content analysis engines for crawled pages.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CheckCWVIssues

func CheckCWVIssues(results []model.LighthouseResult) []model.Issue

CheckCWVIssues examines the aggregated Lighthouse results and returns issues for poor site-wide scores.

func SortedCategories

func SortedCategories(sr *ScoreResult) []string

SortedCategories returns the category names sorted alphabetically. This is useful for deterministic output ordering.

Types

type CWVSummary

type CWVSummary struct {
	PageCount          int        `json:"page_count"`
	AvgPerformance     float64    `json:"avg_performance"`
	AvgAccessibility   float64    `json:"avg_accessibility"`
	AvgSEO             float64    `json:"avg_seo"`
	AvgBestPractices   float64    `json:"avg_best_practices"`
	WorstPerformance   []URLScore `json:"worst_performance"`
	WorstAccessibility []URLScore `json:"worst_accessibility"`
	PassRate           float64    `json:"pass_rate"`
}

CWVSummary aggregates Core Web Vitals / Lighthouse scores across all audited pages.

func AggregateCWV

func AggregateCWV(results []model.LighthouseResult) *CWVSummary

AggregateCWV computes aggregate Lighthouse scores from the given results. It returns nil if results is empty.

type ClassifiedPage

type ClassifiedPage struct {
	URL   string    `json:"url"`
	Class PageClass `json:"class"`
	Score float64   `json:"score"` // confidence 0-1
}

ClassifiedPage pairs a page URL with its detected class and confidence.

func ClassifyPages

func ClassifyPages(pages []*model.Page) []ClassifiedPage

ClassifyPages classifies each page by applying heuristic rules and returning the highest-scoring class. Pages that match no rules are classified as "other" with a score of 0.

type ContentAnalyzer

type ContentAnalyzer struct{}

ContentAnalyzer checks per-page content quality metrics.

func NewContentAnalyzer

func NewContentAnalyzer() *ContentAnalyzer

NewContentAnalyzer returns a new ContentAnalyzer.

func (*ContentAnalyzer) Check

func (c *ContentAnalyzer) Check(_ context.Context, page *model.Page) []model.Issue

Check runs per-page content quality checks.

func (*ContentAnalyzer) Name

func (c *ContentAnalyzer) Name() string

Name returns the checker name.

type CrawlBudgetAnalyzer

type CrawlBudgetAnalyzer struct{}

CrawlBudgetAnalyzer detects crawl budget waste across a site.

func NewCrawlBudgetAnalyzer

func NewCrawlBudgetAnalyzer() *CrawlBudgetAnalyzer

NewCrawlBudgetAnalyzer returns a new CrawlBudgetAnalyzer.

func (*CrawlBudgetAnalyzer) Check

Check returns nil; all analysis is site-wide.

func (*CrawlBudgetAnalyzer) CheckSite

func (c *CrawlBudgetAnalyzer) CheckSite(_ context.Context, pages []*model.Page) []model.Issue

CheckSite runs crawl budget analysis across all pages.

func (*CrawlBudgetAnalyzer) Name

func (c *CrawlBudgetAnalyzer) Name() string

Name returns the checker name.

type DuplicateDetector

type DuplicateDetector struct {
	// ThinContentThreshold is the minimum word count below which a page is
	// flagged as thin content. Defaults to defaultThinContentThreshold.
	ThinContentThreshold int
}

DuplicateDetector finds exact duplicates, near-duplicates, and thin content across a set of crawled pages.

func NewDuplicateDetector

func NewDuplicateDetector() *DuplicateDetector

NewDuplicateDetector returns a DuplicateDetector with default settings.

func (*DuplicateDetector) Analyze

func (d *DuplicateDetector) Analyze(pages []*model.Page) []model.Issue

Analyze examines all pages for exact duplicates, near-duplicates, and thin content.

func (*DuplicateDetector) Check

Check returns nil; all analysis is site-wide.

func (*DuplicateDetector) CheckSite

func (d *DuplicateDetector) CheckSite(_ context.Context, pages []*model.Page) []model.Issue

CheckSite runs duplicate and thin content detection across all pages.

func (*DuplicateDetector) Name

func (d *DuplicateDetector) Name() string

Name returns the checker name.

type LinkGraph

type LinkGraph struct {
	// contains filtered or unexported fields
}

LinkGraph represents the internal link structure of a website.

func BuildGraph

func BuildGraph(pages []*model.Page) *LinkGraph

BuildGraph constructs a LinkGraph from a set of crawled pages. Only internal links (same host) are included in the graph.

func (*LinkGraph) ComputePageRank

func (g *LinkGraph) ComputePageRank(iterations int, dampingFactor float64) map[string]float64

ComputePageRank runs the iterative PageRank algorithm.

Parameters:

  • iterations: number of iterations to run (use 0 for default of 20)
  • dampingFactor: the damping factor d (use 0 for default of 0.85)

Returns a map of URL to PageRank score.

func (*LinkGraph) Edges

func (g *LinkGraph) Edges() int

Edges returns the total number of directed edges in the graph.

func (*LinkGraph) ExportDOT

func (g *LinkGraph) ExportDOT(w io.Writer) error

ExportDOT writes the link graph in Graphviz DOT format to w.

func (*LinkGraph) ExportJSON

func (g *LinkGraph) ExportJSON(w io.Writer) error

ExportJSON writes the link graph as a JSON adjacency list to w.

func (*LinkGraph) Nodes

func (g *LinkGraph) Nodes() int

Nodes returns the number of nodes in the graph.

func (g *LinkGraph) OutgoingLinks(url string) []string

OutgoingLinks returns the outgoing internal links for a URL.

type LinkGraphChecker

type LinkGraphChecker struct{}

LinkGraphChecker wraps LinkGraph analysis as a SiteChecker for the audit registry.

func NewLinkGraphChecker

func NewLinkGraphChecker() *LinkGraphChecker

NewLinkGraphChecker returns a new LinkGraphChecker.

func (*LinkGraphChecker) Check

func (c *LinkGraphChecker) Check(_ context.Context, _ *model.Page) []model.Issue

Check returns nil; all analysis is site-wide.

func (*LinkGraphChecker) CheckSite

func (c *LinkGraphChecker) CheckSite(_ context.Context, pages []*model.Page) []model.Issue

CheckSite builds the link graph, computes PageRank, and generates issues.

func (*LinkGraphChecker) Name

func (c *LinkGraphChecker) Name() string

Name returns the checker name.

type PageClass

type PageClass string

PageClass represents the classification of a page.

const (
	ClassHomepage PageClass = "homepage"
	ClassBlog     PageClass = "blog"
	ClassProduct  PageClass = "product"
	ClassCategory PageClass = "category"
	ClassContact  PageClass = "contact"
	ClassAbout    PageClass = "about"
	ClassLegal    PageClass = "legal"
	ClassAPI      PageClass = "api"
	ClassOther    PageClass = "other"
)

type RedirectChecker

type RedirectChecker struct{}

RedirectChecker implements SiteChecker for redirect-related issues.

func NewRedirectChecker

func NewRedirectChecker() *RedirectChecker

NewRedirectChecker returns a new RedirectChecker.

func (*RedirectChecker) Check

func (c *RedirectChecker) Check(_ context.Context, _ *model.Page) []model.Issue

Check returns nil; all redirect analysis is site-wide.

func (*RedirectChecker) CheckSite

func (c *RedirectChecker) CheckSite(_ context.Context, pages []*model.Page) []model.Issue

CheckSite runs site-wide redirect analysis and returns any issues found.

func (*RedirectChecker) Name

func (c *RedirectChecker) Name() string

Name returns the checker name.

type RedirectEntry

type RedirectEntry struct {
	From     string   `json:"from"`
	To       string   `json:"to"`
	Chain    []string `json:"chain"`
	Hops     int      `json:"hops"`
	LinkedBy []string `json:"linked_by"`
}

RedirectEntry represents a single redirect, including its full chain, hop count, and the set of pages that link to the redirecting URL.

type RedirectMap

type RedirectMap struct {
	Entries []RedirectEntry `json:"entries"`
}

RedirectMap holds all redirect entries discovered during analysis.

func AnalyzeRedirects

func AnalyzeRedirects(pages []*model.Page) *RedirectMap

AnalyzeRedirects scans all pages for non-empty RedirectChain, builds redirect entries with full chain info, cross-references pages that link to the redirecting URL, and returns the result sorted by hop count descending.

func (*RedirectMap) ExportCSV

func (m *RedirectMap) ExportCSV(w io.Writer) error

ExportCSV writes the redirect map as CSV to the given writer. The header row is: from,to,chain,hops,linked_by.

type ScoreResult

type ScoreResult struct {
	Overall     int            `json:"overall"`    // 0-100
	Categories  map[string]int `json:"categories"` // category -> score
	Breakdown   map[string]int `json:"breakdown"`  // severity -> count
	TotalPages  int            `json:"total_pages"`
	TotalIssues int            `json:"total_issues"`
}

ScoreResult holds the computed health score for a crawled site.

func ComputeScore

func ComputeScore(result *model.CrawlResult) *ScoreResult

ComputeScore calculates a 0-100 aggregate health score from the issues in a CrawlResult. The overall score starts at 100 and is decremented per issue (critical=-10, warning=-3, info=-1), floored at 0. Category scores are computed independently using the same algorithm.

type Technology

type Technology struct {
	Name     string `json:"name"`
	Category string `json:"category"` // cms, framework, analytics, cdn, etc.
	Version  string `json:"version,omitempty"`
	Evidence string `json:"evidence"` // what signal detected it
}

Technology represents a detected technology on a website.

func DetectTechnologies

func DetectTechnologies(pages []*model.Page) []Technology

DetectTechnologies scans the provided pages for known technology signatures and returns a deduplicated, sorted list of detected technologies.

Body and asset checks use primarily the first (homepage) page. Header checks are applied across all pages for CDN detection.

type URLScore

type URLScore struct {
	URL   string  `json:"url"`
	Score float64 `json:"score"`
}

URLScore pairs a URL with a numeric score.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL