benchreport

package

v0.32.3 Latest Latest Go to latest Published: Jun 18, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/contenox/runtime

Links

Open Source Insights

Documentation ¶

Overview ¶

Package benchreport is the common local-node benchmark report: one JSON shape emitted across every backend/model/hardware profile so runtime latency and warm-reuse claims stay honest. It drives the backend-neutral runtime/transport.Session contract (EnsurePrefix / PrefillSuffix / Decode), so the same harness measures llama.cpp and OpenVINO. The shape follows docs/blueprints/local-coding-node-goals.md ("Required benchmark report").

Index ¶

type ColdPrefill
type Decode
type EditedStable
type FailureCases
type Harness
type Report
- func Run(ctx context.Context, h Harness, sc Scenario, meta Report) (Report, error)
type Scenario
type SuffixTTFT
type WarmChanged
type WarmSame

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ColdPrefill ¶

type ColdPrefill struct {
	Tokens    int     `json:"tokens"`
	MS        float64 `json:"ms"`
	PromptTPS float64 `json:"prompt_tps"`
	TTFTMS    float64 `json:"ttft_ms"`
}

ColdPrefill measures a fresh prefill of the full stable prefix.

type Decode ¶

type Decode struct {
	OutputTokens int     `json:"output_tokens"`
	TokensPerSec float64 `json:"tokens_per_sec"`
}

Decode measures generation throughput.

type EditedStable ¶

type EditedStable struct {
	ExpectedCacheMiss bool `json:"expected_cache_miss"`
	ActualCacheMiss   bool `json:"actual_cache_miss"`
	OutputEqualsCold  bool `json:"output_equals_cold"`
}

EditedStable proves a changed stable segment is a precise cache miss (the edited prefix is recomputed, not falsely reused) and still equals a cold run.

type FailureCases ¶

type FailureCases struct {
	OverContext  bool `json:"over_context"`
	CancelDecode bool `json:"cancel_decode"`
}

FailureCases records that error paths behave (true = handled as expected).

type Harness ¶

type Harness interface {
	OpenSession(ctx context.Context) (transport.Session, error)
	Turn(stable, suffix string) (transport.PrefixInput, transport.SuffixInput)
}

Harness adapts a concrete backend to the benchmark runner. OpenSession opens a fresh (cold) session on the model under test; Turn builds the manifest-keyed prefix/suffix inputs for the given stable/volatile text (the caller owns prompt planning, so the runner stays backend-neutral).

type Report ¶

type Report struct {
	Model          string `json:"model"`
	Backend        string `json:"backend"`
	Mode           string `json:"mode"`              // e.g. "live_prefix_reuse"
	Profile        string `json:"profile,omitempty"` // profile id / manifest digest
	BackendVersion string `json:"backend_version,omitempty"`

	ColdFullPrefill   ColdPrefill  `json:"cold_full_prefill"`
	WarmSamePrefix    WarmSame     `json:"warm_same_prefix"`
	WarmChangedSuffix WarmChanged  `json:"warm_changed_suffix"`
	EditedStable      EditedStable `json:"edited_stable_segment"`
	Decode            Decode       `json:"decode"`
	SuffixTTFTCurve   []SuffixTTFT `json:"suffix_ttft_curve,omitempty"`
	FailureCases      FailureCases `json:"failure_cases"`

	Notes []string `json:"notes,omitempty"`
}

Report is the full local-node benchmark report. Sections that a given run does not exercise are left zero-valued; `Notes` records why.

func Run ¶

func Run(ctx context.Context, h Harness, sc Scenario, meta Report) (Report, error)

Run executes the scenario against the harness and returns the filled report. It drives only the transport.Session contract, so any backend that implements it (llama.cpp, OpenVINO, the in-memory service) can be measured identically.

type Scenario ¶

type Scenario struct {
	Stable        string
	Suffix        string
	ChangedSuffix string
	EditedStable  string
	SuffixCurve   []string
	MaxDecode     int
}

Scenario is the workspace-context shape to measure: a stable prefix reused across turns, a baseline and a changed suffix, an edited stable prefix that must miss the cache, and the suffix sizes for the headline TTFT curve.

type SuffixTTFT ¶

type SuffixTTFT struct {
	SuffixTokens int     `json:"suffix_tokens"`
	TTFTMS       float64 `json:"ttft_ms"`
}

SuffixTTFT is one point on the headline curve: TTFT as the changed suffix grows.

type WarmChanged ¶

type WarmChanged struct {
	CachedTokens     int     `json:"cached_tokens"`
	SuffixTokens     int     `json:"suffix_tokens"`
	MS               float64 `json:"ms"`
	TTFTMS           float64 `json:"ttft_ms"`
	OutputEqualsCold bool    `json:"output_equals_cold"`
}

WarmChanged measures keeping the stable prefix warm while re-prefilling a changed volatile suffix. OutputEqualsCold proves warm reuse did not corrupt the result versus a cold run of the same inputs.

type WarmSame ¶

type WarmSame struct {
	CachedTokens int     `json:"cached_tokens"`
	NewTokens    int     `json:"new_tokens"`
	MS           float64 `json:"ms"`
	TTFTMS       float64 `json:"ttft_ms"`
	HitRate      float64 `json:"hit_rate"`
}

WarmSame measures re-issuing the identical stable prefix on a live session: a high HitRate means the resident KV was reused rather than recomputed.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL