benchreport

package
v0.32.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 18, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package benchreport is the common local-node benchmark report: one JSON shape emitted across every backend/model/hardware profile so runtime latency and warm-reuse claims stay honest. It drives the backend-neutral runtime/transport.Session contract (EnsurePrefix / PrefillSuffix / Decode), so the same harness measures llama.cpp and OpenVINO. The shape follows docs/blueprints/local-coding-node-goals.md ("Required benchmark report").

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ColdPrefill

type ColdPrefill struct {
	Tokens    int     `json:"tokens"`
	MS        float64 `json:"ms"`
	PromptTPS float64 `json:"prompt_tps"`
	TTFTMS    float64 `json:"ttft_ms"`
}

ColdPrefill measures a fresh prefill of the full stable prefix.

type Decode

type Decode struct {
	OutputTokens int     `json:"output_tokens"`
	TokensPerSec float64 `json:"tokens_per_sec"`
}

Decode measures generation throughput.

type EditedStable

type EditedStable struct {
	ExpectedCacheMiss bool `json:"expected_cache_miss"`
	ActualCacheMiss   bool `json:"actual_cache_miss"`
	OutputEqualsCold  bool `json:"output_equals_cold"`
}

EditedStable proves a changed stable segment is a precise cache miss (the edited prefix is recomputed, not falsely reused) and still equals a cold run.

type FailureCases

type FailureCases struct {
	OverContext  bool `json:"over_context"`
	CancelDecode bool `json:"cancel_decode"`
}

FailureCases records that error paths behave (true = handled as expected).

type Harness

type Harness interface {
	OpenSession(ctx context.Context) (transport.Session, error)
	Turn(stable, suffix string) (transport.PrefixInput, transport.SuffixInput)
}

Harness adapts a concrete backend to the benchmark runner. OpenSession opens a fresh (cold) session on the model under test; Turn builds the manifest-keyed prefix/suffix inputs for the given stable/volatile text (the caller owns prompt planning, so the runner stays backend-neutral).

type Report

type Report struct {
	Model          string `json:"model"`
	Backend        string `json:"backend"`
	Mode           string `json:"mode"`              // e.g. "live_prefix_reuse"
	Profile        string `json:"profile,omitempty"` // profile id / manifest digest
	BackendVersion string `json:"backend_version,omitempty"`

	ColdFullPrefill   ColdPrefill  `json:"cold_full_prefill"`
	WarmSamePrefix    WarmSame     `json:"warm_same_prefix"`
	WarmChangedSuffix WarmChanged  `json:"warm_changed_suffix"`
	EditedStable      EditedStable `json:"edited_stable_segment"`
	Decode            Decode       `json:"decode"`
	SuffixTTFTCurve   []SuffixTTFT `json:"suffix_ttft_curve,omitempty"`
	FailureCases      FailureCases `json:"failure_cases"`

	Notes []string `json:"notes,omitempty"`
}

Report is the full local-node benchmark report. Sections that a given run does not exercise are left zero-valued; `Notes` records why.

func Run

func Run(ctx context.Context, h Harness, sc Scenario, meta Report) (Report, error)

Run executes the scenario against the harness and returns the filled report. It drives only the transport.Session contract, so any backend that implements it (llama.cpp, OpenVINO, the in-memory service) can be measured identically.

type Scenario

type Scenario struct {
	Stable        string
	Suffix        string
	ChangedSuffix string
	EditedStable  string
	SuffixCurve   []string
	MaxDecode     int
}

Scenario is the workspace-context shape to measure: a stable prefix reused across turns, a baseline and a changed suffix, an edited stable prefix that must miss the cache, and the suffix sizes for the headline TTFT curve.

type SuffixTTFT

type SuffixTTFT struct {
	SuffixTokens int     `json:"suffix_tokens"`
	TTFTMS       float64 `json:"ttft_ms"`
}

SuffixTTFT is one point on the headline curve: TTFT as the changed suffix grows.

type WarmChanged

type WarmChanged struct {
	CachedTokens     int     `json:"cached_tokens"`
	SuffixTokens     int     `json:"suffix_tokens"`
	MS               float64 `json:"ms"`
	TTFTMS           float64 `json:"ttft_ms"`
	OutputEqualsCold bool    `json:"output_equals_cold"`
}

WarmChanged measures keeping the stable prefix warm while re-prefilling a changed volatile suffix. OutputEqualsCold proves warm reuse did not corrupt the result versus a cold run of the same inputs.

type WarmSame

type WarmSame struct {
	CachedTokens int     `json:"cached_tokens"`
	NewTokens    int     `json:"new_tokens"`
	MS           float64 `json:"ms"`
	TTFTMS       float64 `json:"ttft_ms"`
	HitRate      float64 `json:"hit_rate"`
}

WarmSame measures re-issuing the identical stable prefix on a live session: a high HitRate means the resident KV was reused rather than recomputed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL