handlers

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package handlers provides eval type handler implementations.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ContainsAnyHandler

type ContainsAnyHandler struct{}

ContainsAnyHandler checks that at least one assistant message contains at least one of the specified patterns. Params: patterns []string (case-insensitive matching).

func (*ContainsAnyHandler) Eval

func (h *ContainsAnyHandler) Eval(
	ctx context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (_ *evals.EvalResult, _ error)

Eval checks assistant messages for any matching pattern.

func (*ContainsAnyHandler) Type

func (h *ContainsAnyHandler) Type() string

Type returns the eval type identifier.

type ContainsHandler

type ContainsHandler struct{}

ContainsHandler checks if CurrentOutput contains all specified patterns (case-insensitive). Params: patterns []string.

func (*ContainsHandler) Eval

func (h *ContainsHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval checks that all patterns appear in the current output.

func (*ContainsHandler) Type

func (h *ContainsHandler) Type() string

Type returns the eval type identifier.

type ContentExcludesHandler

type ContentExcludesHandler struct{}

ContentExcludesHandler checks that NONE of the assistant messages across the full conversation contain any of the forbidden patterns. Params: patterns []string (case-insensitive matching).

func (*ContentExcludesHandler) Eval

func (h *ContentExcludesHandler) Eval(
	ctx context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (_ *evals.EvalResult, _ error)

Eval checks all assistant messages for forbidden patterns.

func (*ContentExcludesHandler) Type

func (h *ContentExcludesHandler) Type() string

Type returns the eval type identifier.

type CosineSimilarityHandler

type CosineSimilarityHandler struct{}

CosineSimilarityHandler computes cosine similarity between embeddings. Params: reference []float64, min_similarity float64. Target embedding comes from Metadata["embedding"].

func (*CosineSimilarityHandler) Eval

func (h *CosineSimilarityHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval computes cosine similarity and checks against threshold.

func (*CosineSimilarityHandler) Type

func (h *CosineSimilarityHandler) Type() string

Type returns the eval type identifier.

type JSONSchemaHandler

type JSONSchemaHandler struct{}

JSONSchemaHandler validates CurrentOutput against a JSON schema. Params: schema map[string]any.

func (*JSONSchemaHandler) Eval

func (h *JSONSchemaHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval validates the current output against the provided JSON schema.

func (*JSONSchemaHandler) Type

func (h *JSONSchemaHandler) Type() string

Type returns the eval type identifier.

type JSONValidHandler

type JSONValidHandler struct{}

JSONValidHandler checks if CurrentOutput is valid JSON. No required params.

func (*JSONValidHandler) Eval

func (h *JSONValidHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	_ map[string]any,
) (result *evals.EvalResult, err error)

Eval checks that the current output is parseable JSON.

func (*JSONValidHandler) Type

func (h *JSONValidHandler) Type() string

Type returns the eval type identifier.

type JudgeOpts

type JudgeOpts struct {
	// Content is the text being evaluated (assistant response or full conversation).
	Content string

	// Criteria describes what the judge should evaluate (e.g. "Is the response helpful?").
	Criteria string

	// Rubric provides detailed scoring guidance (optional).
	Rubric string

	// Model specifies which model to use for judging (optional, provider decides default).
	Model string

	// SystemPrompt overrides the default judge system prompt (optional).
	SystemPrompt string

	// MinScore is the minimum score threshold for passing (optional).
	MinScore *float64

	// Extra holds additional parameters for provider-specific features.
	Extra map[string]any
}

JudgeOpts configures a judge evaluation request.

type JudgeProvider

type JudgeProvider interface {
	// Judge sends the evaluation prompt to an LLM and returns
	// the parsed verdict. Implementations handle provider selection,
	// prompt formatting, and response parsing.
	Judge(ctx context.Context, opts JudgeOpts) (*JudgeResult, error)
}

JudgeProvider abstracts LLM access for judge-based evaluations. Arena, SDK, and eval workers each provide their own implementation wiring their respective provider infrastructure.

type JudgeResult

type JudgeResult struct {
	// Passed indicates whether the content met the evaluation criteria.
	Passed bool

	// Score is the numerical score assigned by the judge (typically 0.0-1.0).
	Score float64

	// Reasoning explains the judge's evaluation.
	Reasoning string

	// Raw is the unprocessed LLM response text.
	Raw string
}

JudgeResult captures the output of an LLM judge evaluation.

type LLMJudgeHandler

type LLMJudgeHandler struct{}

LLMJudgeHandler evaluates a single assistant turn using an LLM judge. The JudgeProvider must be supplied in evalCtx.Metadata["judge_provider"].

Params:

  • criteria (string, required): what to evaluate
  • rubric (string, optional): detailed scoring guidance
  • model (string, optional): model override for the judge
  • system_prompt (string, optional): override default system prompt
  • min_score (float64, optional): minimum score to pass

func (*LLMJudgeHandler) Eval

func (h *LLMJudgeHandler) Eval(
	ctx context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval runs the LLM judge on the current assistant output.

func (*LLMJudgeHandler) Type

func (h *LLMJudgeHandler) Type() string

Type returns the eval type identifier.

type LLMJudgeSessionHandler

type LLMJudgeSessionHandler struct{}

LLMJudgeSessionHandler evaluates an entire conversation using an LLM judge. It concatenates all assistant messages into a single content string for evaluation.

The JudgeProvider must be supplied in evalCtx.Metadata["judge_provider"].

Params:

  • criteria (string, required): what to evaluate
  • rubric (string, optional): detailed scoring guidance
  • model (string, optional): model override for the judge
  • system_prompt (string, optional): override default system prompt
  • min_score (float64, optional): minimum score to pass

func (*LLMJudgeSessionHandler) Eval

func (h *LLMJudgeSessionHandler) Eval(
	ctx context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval runs the LLM judge on all assistant messages in the session.

func (*LLMJudgeSessionHandler) Type

func (h *LLMJudgeSessionHandler) Type() string

Type returns the eval type identifier.

type LatencyBudgetHandler

type LatencyBudgetHandler struct{}

LatencyBudgetHandler checks Metadata["latency_ms"] against a max. Params: max_ms float64.

func (*LatencyBudgetHandler) Eval

func (h *LatencyBudgetHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval checks that the latency is within budget.

func (*LatencyBudgetHandler) Type

func (h *LatencyBudgetHandler) Type() string

Type returns the eval type identifier.

type RegexHandler

type RegexHandler struct{}

RegexHandler checks if CurrentOutput matches a regex pattern. Params: pattern string.

func (*RegexHandler) Eval

func (h *RegexHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval checks that the current output matches the regex pattern.

func (*RegexHandler) Type

func (h *RegexHandler) Type() string

Type returns the eval type identifier.

type ToolArgsExcludedSessionHandler

type ToolArgsExcludedSessionHandler struct{}

ToolArgsExcludedSessionHandler checks that a tool was NOT called with specific argument values across the session. Params: tool_name string, excluded_args map[string]any.

func (*ToolArgsExcludedSessionHandler) Eval

func (h *ToolArgsExcludedSessionHandler) Eval(
	ctx context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (_ *evals.EvalResult, _ error)

Eval ensures the tool was never called with excluded args.

func (*ToolArgsExcludedSessionHandler) Type

Type returns the eval type identifier.

type ToolArgsHandler

type ToolArgsHandler struct{}

ToolArgsHandler checks that a tool was called with specific args. Params: tool_name string, expected_args map[string]any.

func (*ToolArgsHandler) Eval

func (h *ToolArgsHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval checks that the specified tool was called with matching args.

func (*ToolArgsHandler) Type

func (h *ToolArgsHandler) Type() string

Type returns the eval type identifier.

type ToolArgsSessionHandler

type ToolArgsSessionHandler struct{}

ToolArgsSessionHandler checks that a tool was called with specific arguments across the session. Params: tool_name string, expected_args map[string]any.

func (*ToolArgsSessionHandler) Eval

func (h *ToolArgsSessionHandler) Eval(
	ctx context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (_ *evals.EvalResult, _ error)

Eval checks tool calls for expected arguments.

func (*ToolArgsSessionHandler) Type

func (h *ToolArgsSessionHandler) Type() string

Type returns the eval type identifier.

type ToolsCalledHandler

type ToolsCalledHandler struct{}

ToolsCalledHandler checks if specific tools were called. Params: tool_names []string, optional min_calls int.

func (*ToolsCalledHandler) Eval

func (h *ToolsCalledHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval checks that all expected tools were called.

func (*ToolsCalledHandler) Type

func (h *ToolsCalledHandler) Type() string

Type returns the eval type identifier.

type ToolsCalledSessionHandler

type ToolsCalledSessionHandler struct{}

ToolsCalledSessionHandler checks that specific tools were called across the full session. Params: tool_names []string, min_calls int (optional, default 1).

func (*ToolsCalledSessionHandler) Eval

func (h *ToolsCalledSessionHandler) Eval(
	ctx context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (_ *evals.EvalResult, _ error)

Eval checks that all required tools were called at least min_calls times.

func (*ToolsCalledSessionHandler) Type

Type returns the eval type identifier.

type ToolsNotCalledHandler

type ToolsNotCalledHandler struct{}

ToolsNotCalledHandler checks that specific tools were NOT called. Params: tool_names []string.

func (*ToolsNotCalledHandler) Eval

func (h *ToolsNotCalledHandler) Eval(
	_ context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (result *evals.EvalResult, err error)

Eval checks that none of the forbidden tools were called.

func (*ToolsNotCalledHandler) Type

func (h *ToolsNotCalledHandler) Type() string

Type returns the eval type identifier.

type ToolsNotCalledSessionHandler

type ToolsNotCalledSessionHandler struct{}

ToolsNotCalledSessionHandler checks that specific tools were NOT called anywhere in the session. Params: tool_names []string.

func (*ToolsNotCalledSessionHandler) Eval

func (h *ToolsNotCalledSessionHandler) Eval(
	ctx context.Context,
	evalCtx *evals.EvalContext,
	params map[string]any,
) (_ *evals.EvalResult, _ error)

Eval ensures forbidden tools were never called across the session.

func (*ToolsNotCalledSessionHandler) Type

Type returns the eval type identifier.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL