benchmark

package
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package benchmark provides a self-validating benchmark harness for Synapses.

Each Scenario derives ground truth from the current graph state — no hardcoded node IDs. This makes benchmarks portable across any indexed codebase.

Metrics are structural and deterministic: precision, recall, F1, latency. No LLM judge needed — we measure against the graph's own topology.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuiltinScenarioNames

func BuiltinScenarioNames() []string

BuiltinScenarioNames returns the names of all built-in scenarios.

Types

type QueryResult

type QueryResult struct {
	Label     string  `json:"label"`
	Precision float64 `json:"precision"`
	Recall    float64 `json:"recall"`
	F1        float64 `json:"f1"`
	LatencyMs float64 `json:"latency_ms"`
	Expected  int     `json:"expected"` // ground truth size
	Returned  int     `json:"returned"` // result size
	Relevant  int     `json:"relevant"` // |expected ∩ returned|
}

QueryResult holds the outcome of a single benchmark query.

type Result

type Result struct {
	Timestamp  string           `json:"timestamp"`
	RepoID     string           `json:"repo_id"`
	NodeCount  int              `json:"node_count"`
	EdgeCount  int              `json:"edge_count"`
	Scenarios  []ScenarioResult `json:"scenarios"`
	Summary    Summary          `json:"summary"`
	DurationMs int64            `json:"total_duration_ms"`
	Note       string           `json:"note,omitempty"` // informational note (e.g. scenario name normalization)
}

Result holds the outcome of running a complete benchmark suite.

func RunAll

func RunAll(g *graph.Graph, st *store.Store) *Result

RunAll executes all built-in scenarios and returns the aggregate result.

func RunScenarios

func RunScenarios(g *graph.Graph, st *store.Store, scenarios []Scenario) *Result

RunScenarios executes the given scenarios and returns the aggregate result.

type Scenario

type Scenario struct {
	Name        string
	Description string
	// Run executes the scenario against the live graph and store.
	// Returns query results. Error means the scenario couldn't run (not a quality failure).
	Run func(g *graph.Graph, st *store.Store) ([]QueryResult, error)
	// PassThreshold is the minimum average F1 to consider this scenario "passed".
	PassThreshold float64
}

Scenario defines a benchmark that derives ground truth from the graph.

func BuiltinScenarios

func BuiltinScenarios() []Scenario

BuiltinScenarios returns the standard set of scenarios shipped with Synapses. These are listed in BuiltinScenarioNames() for MCP tool discovery.

func FindScenario

func FindScenario(name string) (Scenario, error)

FindScenario returns the named scenario, or an error if not found.

type ScenarioResult

type ScenarioResult struct {
	Name         string        `json:"name"`
	Description  string        `json:"description"`
	Queries      []QueryResult `json:"queries"`
	Passed       bool          `json:"passed"`
	AvgF1        float64       `json:"avg_f1"`
	AvgLatencyMs float64       `json:"avg_latency_ms"`
	Error        string        `json:"error,omitempty"`
}

ScenarioResult holds the outcome of a single scenario.

type Summary

type Summary struct {
	ScenariosRun     int     `json:"scenarios_run"`
	ScenariosPassed  int     `json:"scenarios_passed"`
	ScenariosErrored int     `json:"scenarios_errored"` // scenarios that could not run (graph too small, etc.)
	AvgPrecision     float64 `json:"avg_precision"`
	AvgRecall        float64 `json:"avg_recall"`
	AvgF1            float64 `json:"avg_f1"`
	AvgLatencyMs     float64 `json:"avg_latency_ms"`
	P95LatencyMs     float64 `json:"p95_latency_ms"`
}

Summary aggregates across all scenarios.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL