lighthouse

package

v0.33.11 Latest Latest Go to latest Published: Apr 26, 2026 License: MIT Imports: 22 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Harvey-AU/hover

Links

Open Source Insights

Documentation ¶

Overview ¶

Package lighthouse implements the sampler, scheduler, and runner used to capture Lighthouse performance audits for a small subset of pages during each crawl. See docs/plans/lighthouse-performance-reports.md for the full design.

Index ¶

Variables
func PerBand(completedPages int) int
func SanitiseAuditURL(raw string) string
type AuditRequest
type AuditResult
- func ParseReport(raw []byte) (AuditResult, error)
type CompletedTask
type LocalRunner
- func NewLocalRunner(cfg LocalRunnerConfig) (*LocalRunner, error)
- func (l *LocalRunner) Run(ctx context.Context, req AuditRequest) (AuditResult, error)
type LocalRunnerConfig
type Profile
type Runner
type Sample
- func SelectSamples(completed []CompletedTask, milestone int, alreadySampled map[int]SelectionBand) []Sample
type Scheduler
- func NewScheduler(database SchedulerDB, queue TxRunner) *Scheduler
- func (s *Scheduler) OnMilestone(ctx context.Context, jobID string, milestone int) error
type SchedulerDB
type SelectionBand
type StubRunner
- func NewStubRunner() *StubRunner
- func (s *StubRunner) Run(ctx context.Context, req AuditRequest) (AuditResult, error)
type TxRunner

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrMemoryShed = errors.New("lighthouse runner shed audit due to low memory")

ErrMemoryShed is returned by LocalRunner.Run when free memory is below LIGHTHOUSE_MEMORY_SHED_THRESHOLD_MB. The consumer treats this like a shutdown cancellation: leave the lighthouse_runs row in 'running' so XAUTOCLAIM redelivers it once memory recovers, and skip the XAck so the Redis stream entry survives.

View Source

var ErrRunnerNotImplemented = errors.New("lighthouse local runner not implemented yet")

ErrRunnerNotImplemented is returned by the local runner shim until Phase 3 wires Chromium into the analysis-app image. Keeping the error here means the scheduler and DB layer can be exercised end-to-end with the stub before any Chromium work lands.

Functions ¶

func PerBand ¶

func PerBand(completedPages int) int

PerBand returns the number of audits to schedule per extreme band for a job with completedPages successful tasks so far. Floored at 1 (so even a 1-page site gets one audit per band) and capped at 15 (so a 10,000-page crawl never queues more than 30 audits per job).

math.Floor (rather than Round) keeps the curve from over-shooting the cap at exactly 10,000 pages and matches the published anchor points exactly.

func SanitiseAuditURL ¶

func SanitiseAuditURL(raw string) string

SanitiseAuditURL strips query strings and fragments before logging. Lighthouse audit URLs come from customer crawls and can carry session tokens, signed-link tokens, or other low-entropy PII in the query string; the runner does not need them in central logs. Exported so the analysis service (cmd/analysis) can apply the same rule to its own info-level logs without redefining the helper.

Types ¶

type AuditRequest ¶

type AuditRequest struct {
	RunID        int64
	JobID        string
	PageID       int
	SourceTaskID string
	URL          string
	Profile      Profile
	Timeout      time.Duration
}

AuditRequest is the input handed to a Runner. The scheduler builds these from a lighthouse_runs row plus the matching tasks/pages metadata. Timeout is the per-run budget; runners must respect it.

SourceTaskID is the lighthouse_runs.source_task_id (empty when the FK was NULLed via ON DELETE SET NULL). The local runner uses it to co-locate the report with the matching crawl artefact under jobs/{JobID}/tasks/{SourceTaskID}/. Empty falls back to a run-id-keyed path so a deleted parent task doesn't lose the report.

type AuditResult ¶

type AuditResult struct {
	PerformanceScore *int
	LCPMs            *int
	CLS              *float64
	INPMs            *int
	TBTMs            *int
	FCPMs            *int
	SpeedIndexMs     *int
	TTFBMs           *int
	TotalByteWeight  *int64
	ReportKey        string
	Duration         time.Duration
}

AuditResult is the output of a successful audit. ReportKey is the R2 object key that the runner uploaded the full Lighthouse JSON to; empty for the stub runner since it doesn't write to R2 in Phase 1.

Optional metric fields are pointers so we can distinguish "not produced" from "produced as zero" — Lighthouse occasionally omits metrics on pages it can't audit cleanly.

func ParseReport ¶

func ParseReport(raw []byte) (AuditResult, error)

ParseReport turns a Lighthouse JSON report into an AuditResult. The Duration field is left zero — the runner stamps wall time from the outside since the report itself doesn't capture exec wrapper cost.

ReportKey is left empty here for the same reason: it's the R2 key, which only the runner that just uploaded the report can know.

Returns an error only on malformed JSON; a report that is well-formed but missing a given audit produces a nil pointer in that field, which preserves the "not measured" semantics of the lighthouse_runs columns.

type CompletedTask ¶

type CompletedTask struct {
	PageID       int
	TaskID       string
	ResponseTime int64
}

CompletedTask is the input shape the sampler needs from a completed crawl task. ResponseTime is the HTTP response time in milliseconds (tasks.response_time, BIGINT). The sampler does not need anything else from the row.

type LocalRunner ¶

type LocalRunner struct {
	// contains filtered or unexported fields
}

LocalRunner shells out to the bundled lighthouse CLI to perform real audits. Implements the Runner interface. Phase 3's only Runner alternative to StubRunner.

func NewLocalRunner ¶

func NewLocalRunner(cfg LocalRunnerConfig) (*LocalRunner, error)

NewLocalRunner constructs a LocalRunner with the supplied config. Returns an error if the config is missing pieces the runner needs to operate (binary paths, bucket, archive provider) — better to fail at boot than to discover the missing config on the first audit.

func (*LocalRunner) Run ¶

func (l *LocalRunner) Run(ctx context.Context, req AuditRequest) (AuditResult, error)

Run executes a single Lighthouse audit, parses the JSON result, uploads the gzipped raw report to R2, and returns the populated AuditResult. Honours req.Timeout via context wrapping; the caller's XAck/Fail flow on the consumer side decides what to do with errors.

Retry policy: at most one retry on a transient Chromium failure, recognised via stderr substring match. After that, the most recent error (with stderr tail) is propagated up.

type LocalRunnerConfig ¶

type LocalRunnerConfig struct {
	LighthouseBin string
	ChromiumBin   string
	Provider      archive.ColdStorageProvider
	Bucket        string
	MemoryShedMB  int
	ProfilePreset Profile // defaults to ProfileMobile if empty
}

LocalRunnerConfig captures the bits the local runner needs from the analysis service config. Kept separate from cmd/analysis so the runner is testable without the whole consumer scaffolding.

type Profile ¶

type Profile string

Profile selects between Lighthouse's mobile and desktop presets. v1 only schedules mobile audits; desktop is reserved for Phase 5.

const (
	ProfileMobile  Profile = "mobile"
	ProfileDesktop Profile = "desktop"
)

type Runner ¶

type Runner interface {
	Run(ctx context.Context, req AuditRequest) (AuditResult, error)
}

Runner executes a single Lighthouse audit. The Phase 1 stub implementation returns canned data so the rest of the pipeline can be exercised before Chromium lands. Phase 3 adds a localRunner that shells out to the bundled lighthouse binary.

type Sample ¶

type Sample struct {
	Task CompletedTask
	Band SelectionBand
}

Sample is one task selected by the sampler, tagged with the band it came from. The scheduler turns each Sample into a lighthouse_runs row plus an outbox entry.

func SelectSamples ¶

func SelectSamples(completed []CompletedTask, milestone int, alreadySampled map[int]SelectionBand) []Sample

SelectSamples picks fastest and slowest tasks from completed, enforcing a global per-band cap of PerBand(len(completed)) across the lifetime of a job. alreadySampled maps every page_id already queued for the job (any band) to the band it was scheduled under; the function uses it both to dedupe (a page never appears twice) and to count existing fastest/slowest rows so each milestone only tops up the band quotas — it never re-spends them.

BandReconcile rows in alreadySampled count toward dedupe but not toward fastest/slowest quotas: at milestone 100 the scheduler retags whatever the sampler picks as reconcile, so by construction reconcile rows shouldn't exist before this call. If they do (e.g. a duplicate milestone-100 fire) the dedupe still keeps page IDs disjoint while the quota math correctly treats the existing rows as already-spent budget.

The fastest and slowest output sets are guaranteed disjoint: when fewer than fastestNeeded+slowestNeeded candidates remain, fastest takes priority and slowest fills from what's left.

Order in the returned slice is fastest band first (ascending response_time), then slowest band (descending response_time). The scheduler treats this slice as the per-milestone work list.

The function is pure — it does not touch the database or the network — so it is straightforward to unit test.

type Scheduler ¶

type Scheduler struct {
	// contains filtered or unexported fields
}

Scheduler turns crawl progress into pending lighthouse_runs rows and matching task_outbox entries. One scheduler is bound to a single SchedulerDB / TxRunner pair; the JobManager invokes OnMilestone via the OnProgressMilestone callback whenever a flush observes a 10% boundary crossing.

func NewScheduler ¶

func NewScheduler(database SchedulerDB, queue TxRunner) *Scheduler

NewScheduler constructs a Scheduler. Callers are expected to wire the returned Scheduler.OnMilestone into JobManager.OnProgressMilestone during bootstrap.

func (*Scheduler) OnMilestone ¶

func (s *Scheduler) OnMilestone(ctx context.Context, jobID string, milestone int) error

OnMilestone runs the sampler at the given milestone (0..100) and enqueues any newly chosen audits. Safe to call from the milestone callback path: errors are returned for the caller to log, but the caller is expected to never let a scheduler failure block the batch loop.

At milestone == 100, samples are tagged as the 'reconcile' band so the analytics layer can distinguish opportunistic per-decade picks from the catch-up pass at job completion.

type SchedulerDB ¶

type SchedulerDB interface {
	GetCompletedTasksForLighthouseSampling(ctx context.Context, jobID string) ([]db.CompletedTaskForSampling, error)
	GetLighthouseRunPageBands(ctx context.Context, jobID string) (map[int]db.LighthouseSelectionBand, error)
}

SchedulerDB is the narrow subset of *db.DB the scheduler needs. Kept as an interface so unit tests can inject a fake without standing up a full Postgres pool.

type SelectionBand ¶

type SelectionBand string

SelectionBand identifies which extreme of the response-time distribution a sampled task came from. The reconcile band is reserved for the 100% pass run by the scheduler at job completion.

const (
	BandFastest   SelectionBand = "fastest"
	BandSlowest   SelectionBand = "slowest"
	BandReconcile SelectionBand = "reconcile"
)

type StubRunner ¶

type StubRunner struct{}

StubRunner is a deterministic Runner that returns canned metrics without launching Chromium. Used for local development, CI, and integration tests that drive a synthetic job through the full schedule → enqueue → record pipeline.

Metric values are fixed so test assertions stay simple. Tests that need varied data should construct a custom Runner rather than extending this one.

func NewStubRunner ¶

func NewStubRunner() *StubRunner

NewStubRunner returns a StubRunner ready for use.

func (*StubRunner) Run ¶

func (s *StubRunner) Run(ctx context.Context, req AuditRequest) (AuditResult, error)

Run honours ctx cancellation and req.Timeout but otherwise sleeps briefly to approximate the cost of an audit, then returns a canned result.

type TxRunner ¶

type TxRunner interface {
	ExecuteWithContext(ctx context.Context, fn func(ctx context.Context, tx *sql.Tx) error) error
}

TxRunner runs the supplied function inside a Postgres transaction. Mirrors db.QueueExecutor.ExecuteWithContext so the scheduler can reuse the same retry / pool semantics the rest of the codebase already exercises.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL