analysis

package

v0.0.1 Latest Latest Go to latest Published: Mar 13, 2026 License: Apache-2.0 Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/meysam81/scry

Links

Open Source Insights

Documentation ¶

Overview ¶

Package analysis provides advanced content analysis engines for crawled pages.

Index ¶

func CheckCWVIssues(results []model.LighthouseResult) []model.Issue
func SortedCategories(sr *ScoreResult) []string
type CWVSummary
- func AggregateCWV(results []model.LighthouseResult) *CWVSummary
type ClassifiedPage
- func ClassifyPages(pages []*model.Page) []ClassifiedPage
type ContentAnalyzer
- func NewContentAnalyzer() *ContentAnalyzer
- func (c *ContentAnalyzer) Check(_ context.Context, page *model.Page) []model.Issue
- func (c *ContentAnalyzer) Name() string
type CrawlBudgetAnalyzer
- func NewCrawlBudgetAnalyzer() *CrawlBudgetAnalyzer
- func (c *CrawlBudgetAnalyzer) Check(_ context.Context, _ *model.Page) []model.Issue
- func (c *CrawlBudgetAnalyzer) CheckSite(_ context.Context, pages []*model.Page) []model.Issue
- func (c *CrawlBudgetAnalyzer) Name() string
type DuplicateDetector
- func NewDuplicateDetector() *DuplicateDetector
- func (d *DuplicateDetector) Analyze(pages []*model.Page) []model.Issue
- func (d *DuplicateDetector) Check(_ context.Context, _ *model.Page) []model.Issue
- func (d *DuplicateDetector) CheckSite(_ context.Context, pages []*model.Page) []model.Issue
- func (d *DuplicateDetector) Name() string
type LinkGraph
- func BuildGraph(pages []*model.Page) *LinkGraph
- func (g *LinkGraph) ComputePageRank(iterations int, dampingFactor float64) map[string]float64
- func (g *LinkGraph) Edges() int
- func (g *LinkGraph) ExportDOT(w io.Writer) error
- func (g *LinkGraph) ExportJSON(w io.Writer) error
- func (g *LinkGraph) Nodes() int
- func (g *LinkGraph) OutgoingLinks(url string) []string
type LinkGraphChecker
- func NewLinkGraphChecker() *LinkGraphChecker
- func (c *LinkGraphChecker) Check(_ context.Context, _ *model.Page) []model.Issue
- func (c *LinkGraphChecker) CheckSite(_ context.Context, pages []*model.Page) []model.Issue
- func (c *LinkGraphChecker) Name() string
type PageClass
type RedirectChecker
- func NewRedirectChecker() *RedirectChecker
- func (c *RedirectChecker) Check(_ context.Context, _ *model.Page) []model.Issue
- func (c *RedirectChecker) CheckSite(_ context.Context, pages []*model.Page) []model.Issue
- func (c *RedirectChecker) Name() string
type RedirectEntry
type RedirectMap
- func AnalyzeRedirects(pages []*model.Page) *RedirectMap
- func (m *RedirectMap) ExportCSV(w io.Writer) error
type ScoreResult
- func ComputeScore(result *model.CrawlResult) *ScoreResult
type Technology
- func DetectTechnologies(pages []*model.Page) []Technology
type URLScore

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CheckCWVIssues ¶

func CheckCWVIssues(results []model.LighthouseResult) []model.Issue

CheckCWVIssues examines the aggregated Lighthouse results and returns issues for poor site-wide scores.

func SortedCategories ¶

func SortedCategories(sr *ScoreResult) []string

SortedCategories returns the category names sorted alphabetically. This is useful for deterministic output ordering.

Types ¶

type CWVSummary ¶

type CWVSummary struct {
	PageCount          int        `json:"page_count"`
	AvgPerformance     float64    `json:"avg_performance"`
	AvgAccessibility   float64    `json:"avg_accessibility"`
	AvgSEO             float64    `json:"avg_seo"`
	AvgBestPractices   float64    `json:"avg_best_practices"`
	WorstPerformance   []URLScore `json:"worst_performance"`
	WorstAccessibility []URLScore `json:"worst_accessibility"`
	PassRate           float64    `json:"pass_rate"`
}

CWVSummary aggregates Core Web Vitals / Lighthouse scores across all audited pages.

func AggregateCWV ¶

func AggregateCWV(results []model.LighthouseResult) *CWVSummary

AggregateCWV computes aggregate Lighthouse scores from the given results. It returns nil if results is empty.

type ClassifiedPage ¶

type ClassifiedPage struct {
	URL   string    `json:"url"`
	Class PageClass `json:"class"`
	Score float64   `json:"score"` // confidence 0-1
}

ClassifiedPage pairs a page URL with its detected class and confidence.

func ClassifyPages ¶

func ClassifyPages(pages []*model.Page) []ClassifiedPage

ClassifyPages classifies each page by applying heuristic rules and returning the highest-scoring class. Pages that match no rules are classified as "other" with a score of 0.

type ContentAnalyzer ¶

type ContentAnalyzer struct{}

ContentAnalyzer checks per-page content quality metrics.

func NewContentAnalyzer ¶

func NewContentAnalyzer() *ContentAnalyzer

NewContentAnalyzer returns a new ContentAnalyzer.

func (*ContentAnalyzer) Check ¶

func (c *ContentAnalyzer) Check(_ context.Context, page *model.Page) []model.Issue

Check runs per-page content quality checks.

func (*ContentAnalyzer) Name ¶

func (c *ContentAnalyzer) Name() string

Name returns the checker name.

type CrawlBudgetAnalyzer ¶

type CrawlBudgetAnalyzer struct{}

CrawlBudgetAnalyzer detects crawl budget waste across a site.

func NewCrawlBudgetAnalyzer ¶

func NewCrawlBudgetAnalyzer() *CrawlBudgetAnalyzer

NewCrawlBudgetAnalyzer returns a new CrawlBudgetAnalyzer.

func (*CrawlBudgetAnalyzer) Check ¶

func (c *CrawlBudgetAnalyzer) Check(_ context.Context, _ *model.Page) []model.Issue

Check returns nil; all analysis is site-wide.

func (*CrawlBudgetAnalyzer) CheckSite ¶

func (c *CrawlBudgetAnalyzer) CheckSite(_ context.Context, pages []*model.Page) []model.Issue

CheckSite runs crawl budget analysis across all pages.

func (*CrawlBudgetAnalyzer) Name ¶

func (c *CrawlBudgetAnalyzer) Name() string

Name returns the checker name.

type DuplicateDetector ¶

type DuplicateDetector struct {
	// ThinContentThreshold is the minimum word count below which a page is
	// flagged as thin content. Defaults to defaultThinContentThreshold.
	ThinContentThreshold int
}

DuplicateDetector finds exact duplicates, near-duplicates, and thin content across a set of crawled pages.

func NewDuplicateDetector ¶

func NewDuplicateDetector() *DuplicateDetector

NewDuplicateDetector returns a DuplicateDetector with default settings.

func (*DuplicateDetector) Analyze ¶

func (d *DuplicateDetector) Analyze(pages []*model.Page) []model.Issue

Analyze examines all pages for exact duplicates, near-duplicates, and thin content.

func (*DuplicateDetector) Check ¶

func (d *DuplicateDetector) Check(_ context.Context, _ *model.Page) []model.Issue

Check returns nil; all analysis is site-wide.

func (*DuplicateDetector) CheckSite ¶

func (d *DuplicateDetector) CheckSite(_ context.Context, pages []*model.Page) []model.Issue

CheckSite runs duplicate and thin content detection across all pages.

func (*DuplicateDetector) Name ¶

func (d *DuplicateDetector) Name() string

Name returns the checker name.

type LinkGraph ¶

type LinkGraph struct {
	// contains filtered or unexported fields
}

LinkGraph represents the internal link structure of a website.

func BuildGraph ¶

func BuildGraph(pages []*model.Page) *LinkGraph

BuildGraph constructs a LinkGraph from a set of crawled pages. Only internal links (same host) are included in the graph.

func (*LinkGraph) ComputePageRank ¶

func (g *LinkGraph) ComputePageRank(iterations int, dampingFactor float64) map[string]float64

ComputePageRank runs the iterative PageRank algorithm.

Parameters:

iterations: number of iterations to run (use 0 for default of 20)
dampingFactor: the damping factor d (use 0 for default of 0.85)

Returns a map of URL to PageRank score.

func (*LinkGraph) Edges ¶

func (g *LinkGraph) Edges() int

Edges returns the total number of directed edges in the graph.

func (*LinkGraph) ExportDOT ¶

func (g *LinkGraph) ExportDOT(w io.Writer) error

ExportDOT writes the link graph in Graphviz DOT format to w.

func (*LinkGraph) ExportJSON ¶

func (g *LinkGraph) ExportJSON(w io.Writer) error

ExportJSON writes the link graph as a JSON adjacency list to w.

func (*LinkGraph) Nodes ¶

func (g *LinkGraph) Nodes() int

Nodes returns the number of nodes in the graph.

func (*LinkGraph) OutgoingLinks ¶

func (g *LinkGraph) OutgoingLinks(url string) []string

OutgoingLinks returns the outgoing internal links for a URL.

type LinkGraphChecker ¶

type LinkGraphChecker struct{}

LinkGraphChecker wraps LinkGraph analysis as a SiteChecker for the audit registry.

func NewLinkGraphChecker ¶

func NewLinkGraphChecker() *LinkGraphChecker

NewLinkGraphChecker returns a new LinkGraphChecker.

func (*LinkGraphChecker) Check ¶

func (c *LinkGraphChecker) Check(_ context.Context, _ *model.Page) []model.Issue

Check returns nil; all analysis is site-wide.

func (*LinkGraphChecker) CheckSite ¶

func (c *LinkGraphChecker) CheckSite(_ context.Context, pages []*model.Page) []model.Issue

CheckSite builds the link graph, computes PageRank, and generates issues.

func (*LinkGraphChecker) Name ¶

func (c *LinkGraphChecker) Name() string

Name returns the checker name.

type PageClass ¶

type PageClass string

PageClass represents the classification of a page.

const (
	ClassHomepage PageClass = "homepage"
	ClassBlog     PageClass = "blog"
	ClassProduct  PageClass = "product"
	ClassCategory PageClass = "category"
	ClassContact  PageClass = "contact"
	ClassAbout    PageClass = "about"
	ClassLegal    PageClass = "legal"
	ClassAPI      PageClass = "api"
	ClassOther    PageClass = "other"
)

type RedirectChecker ¶

type RedirectChecker struct{}

RedirectChecker implements SiteChecker for redirect-related issues.

func NewRedirectChecker ¶

func NewRedirectChecker() *RedirectChecker

NewRedirectChecker returns a new RedirectChecker.

func (*RedirectChecker) Check ¶

func (c *RedirectChecker) Check(_ context.Context, _ *model.Page) []model.Issue

Check returns nil; all redirect analysis is site-wide.

func (*RedirectChecker) CheckSite ¶

func (c *RedirectChecker) CheckSite(_ context.Context, pages []*model.Page) []model.Issue

CheckSite runs site-wide redirect analysis and returns any issues found.

func (*RedirectChecker) Name ¶

func (c *RedirectChecker) Name() string

Name returns the checker name.

type RedirectEntry ¶

type RedirectEntry struct {
	From     string   `json:"from"`
	To       string   `json:"to"`
	Chain    []string `json:"chain"`
	Hops     int      `json:"hops"`
	LinkedBy []string `json:"linked_by"`
}

RedirectEntry represents a single redirect, including its full chain, hop count, and the set of pages that link to the redirecting URL.

type RedirectMap ¶

type RedirectMap struct {
	Entries []RedirectEntry `json:"entries"`
}

RedirectMap holds all redirect entries discovered during analysis.

func AnalyzeRedirects ¶

func AnalyzeRedirects(pages []*model.Page) *RedirectMap

AnalyzeRedirects scans all pages for non-empty RedirectChain, builds redirect entries with full chain info, cross-references pages that link to the redirecting URL, and returns the result sorted by hop count descending.

func (*RedirectMap) ExportCSV ¶

func (m *RedirectMap) ExportCSV(w io.Writer) error

ExportCSV writes the redirect map as CSV to the given writer. The header row is: from,to,chain,hops,linked_by.

type ScoreResult ¶

type ScoreResult struct {
	Overall     int            `json:"overall"`    // 0-100
	Categories  map[string]int `json:"categories"` // category -> score
	Breakdown   map[string]int `json:"breakdown"`  // severity -> count
	TotalPages  int            `json:"total_pages"`
	TotalIssues int            `json:"total_issues"`
}

ScoreResult holds the computed health score for a crawled site.

func ComputeScore ¶

func ComputeScore(result *model.CrawlResult) *ScoreResult

ComputeScore calculates a 0-100 aggregate health score from the issues in a CrawlResult. The overall score starts at 100 and is decremented per issue (critical=-10, warning=-3, info=-1), floored at 0. Category scores are computed independently using the same algorithm.

type Technology ¶

type Technology struct {
	Name     string `json:"name"`
	Category string `json:"category"` // cms, framework, analytics, cdn, etc.
	Version  string `json:"version,omitempty"`
	Evidence string `json:"evidence"` // what signal detected it
}

Technology represents a detected technology on a website.

func DetectTechnologies ¶

func DetectTechnologies(pages []*model.Page) []Technology

DetectTechnologies scans the provided pages for known technology signatures and returns a deduplicated, sorted list of detected technologies.

Body and asset checks use primarily the first (homepage) page. Header checks are applied across all pages for CDN detection.

type URLScore ¶

type URLScore struct {
	URL   string  `json:"url"`
	Score float64 `json:"score"`
}

URLScore pairs a URL with a numeric score.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL