Documentation
¶
Overview ¶
Package handlers provides eval type handler implementations.
Index ¶
- type ContainsAnyHandler
- type ContainsHandler
- type ContentExcludesHandler
- type CosineSimilarityHandler
- type JSONSchemaHandler
- type JSONValidHandler
- type JudgeOpts
- type JudgeProvider
- type JudgeResult
- type LLMJudgeHandler
- type LLMJudgeSessionHandler
- type LatencyBudgetHandler
- type RegexHandler
- type ToolArgsExcludedSessionHandler
- type ToolArgsHandler
- type ToolArgsSessionHandler
- type ToolsCalledHandler
- type ToolsCalledSessionHandler
- type ToolsNotCalledHandler
- type ToolsNotCalledSessionHandler
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ContainsAnyHandler ¶
type ContainsAnyHandler struct{}
ContainsAnyHandler checks that at least one assistant message contains at least one of the specified patterns. Params: patterns []string (case-insensitive matching).
func (*ContainsAnyHandler) Eval ¶
func (h *ContainsAnyHandler) Eval( ctx context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (_ *evals.EvalResult, _ error)
Eval checks assistant messages for any matching pattern.
func (*ContainsAnyHandler) Type ¶
func (h *ContainsAnyHandler) Type() string
Type returns the eval type identifier.
type ContainsHandler ¶
type ContainsHandler struct{}
ContainsHandler checks if CurrentOutput contains all specified patterns (case-insensitive). Params: patterns []string.
func (*ContainsHandler) Eval ¶
func (h *ContainsHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval checks that all patterns appear in the current output.
func (*ContainsHandler) Type ¶
func (h *ContainsHandler) Type() string
Type returns the eval type identifier.
type ContentExcludesHandler ¶
type ContentExcludesHandler struct{}
ContentExcludesHandler checks that NONE of the assistant messages across the full conversation contain any of the forbidden patterns. Params: patterns []string (case-insensitive matching).
func (*ContentExcludesHandler) Eval ¶
func (h *ContentExcludesHandler) Eval( ctx context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (_ *evals.EvalResult, _ error)
Eval checks all assistant messages for forbidden patterns.
func (*ContentExcludesHandler) Type ¶
func (h *ContentExcludesHandler) Type() string
Type returns the eval type identifier.
type CosineSimilarityHandler ¶
type CosineSimilarityHandler struct{}
CosineSimilarityHandler computes cosine similarity between embeddings. Params: reference []float64, min_similarity float64. Target embedding comes from Metadata["embedding"].
func (*CosineSimilarityHandler) Eval ¶
func (h *CosineSimilarityHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval computes cosine similarity and checks against threshold.
func (*CosineSimilarityHandler) Type ¶
func (h *CosineSimilarityHandler) Type() string
Type returns the eval type identifier.
type JSONSchemaHandler ¶
type JSONSchemaHandler struct{}
JSONSchemaHandler validates CurrentOutput against a JSON schema. Params: schema map[string]any.
func (*JSONSchemaHandler) Eval ¶
func (h *JSONSchemaHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval validates the current output against the provided JSON schema.
func (*JSONSchemaHandler) Type ¶
func (h *JSONSchemaHandler) Type() string
Type returns the eval type identifier.
type JSONValidHandler ¶
type JSONValidHandler struct{}
JSONValidHandler checks if CurrentOutput is valid JSON. No required params.
func (*JSONValidHandler) Eval ¶
func (h *JSONValidHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, _ map[string]any, ) (result *evals.EvalResult, err error)
Eval checks that the current output is parseable JSON.
func (*JSONValidHandler) Type ¶
func (h *JSONValidHandler) Type() string
Type returns the eval type identifier.
type JudgeOpts ¶
type JudgeOpts struct {
// Content is the text being evaluated (assistant response or full conversation).
Content string
// Criteria describes what the judge should evaluate (e.g. "Is the response helpful?").
Criteria string
// Rubric provides detailed scoring guidance (optional).
Rubric string
// Model specifies which model to use for judging (optional, provider decides default).
Model string
// SystemPrompt overrides the default judge system prompt (optional).
SystemPrompt string
// MinScore is the minimum score threshold for passing (optional).
MinScore *float64
// Extra holds additional parameters for provider-specific features.
Extra map[string]any
}
JudgeOpts configures a judge evaluation request.
type JudgeProvider ¶
type JudgeProvider interface {
// Judge sends the evaluation prompt to an LLM and returns
// the parsed verdict. Implementations handle provider selection,
// prompt formatting, and response parsing.
Judge(ctx context.Context, opts JudgeOpts) (*JudgeResult, error)
}
JudgeProvider abstracts LLM access for judge-based evaluations. Arena, SDK, and eval workers each provide their own implementation wiring their respective provider infrastructure.
type JudgeResult ¶
type JudgeResult struct {
// Passed indicates whether the content met the evaluation criteria.
Passed bool
// Score is the numerical score assigned by the judge (typically 0.0-1.0).
Score float64
// Reasoning explains the judge's evaluation.
Reasoning string
// Raw is the unprocessed LLM response text.
Raw string
}
JudgeResult captures the output of an LLM judge evaluation.
type LLMJudgeHandler ¶
type LLMJudgeHandler struct{}
LLMJudgeHandler evaluates a single assistant turn using an LLM judge. The JudgeProvider must be supplied in evalCtx.Metadata["judge_provider"].
Params:
- criteria (string, required): what to evaluate
- rubric (string, optional): detailed scoring guidance
- model (string, optional): model override for the judge
- system_prompt (string, optional): override default system prompt
- min_score (float64, optional): minimum score to pass
func (*LLMJudgeHandler) Eval ¶
func (h *LLMJudgeHandler) Eval( ctx context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval runs the LLM judge on the current assistant output.
func (*LLMJudgeHandler) Type ¶
func (h *LLMJudgeHandler) Type() string
Type returns the eval type identifier.
type LLMJudgeSessionHandler ¶
type LLMJudgeSessionHandler struct{}
LLMJudgeSessionHandler evaluates an entire conversation using an LLM judge. It concatenates all assistant messages into a single content string for evaluation.
The JudgeProvider must be supplied in evalCtx.Metadata["judge_provider"].
Params:
- criteria (string, required): what to evaluate
- rubric (string, optional): detailed scoring guidance
- model (string, optional): model override for the judge
- system_prompt (string, optional): override default system prompt
- min_score (float64, optional): minimum score to pass
func (*LLMJudgeSessionHandler) Eval ¶
func (h *LLMJudgeSessionHandler) Eval( ctx context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval runs the LLM judge on all assistant messages in the session.
func (*LLMJudgeSessionHandler) Type ¶
func (h *LLMJudgeSessionHandler) Type() string
Type returns the eval type identifier.
type LatencyBudgetHandler ¶
type LatencyBudgetHandler struct{}
LatencyBudgetHandler checks Metadata["latency_ms"] against a max. Params: max_ms float64.
func (*LatencyBudgetHandler) Eval ¶
func (h *LatencyBudgetHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval checks that the latency is within budget.
func (*LatencyBudgetHandler) Type ¶
func (h *LatencyBudgetHandler) Type() string
Type returns the eval type identifier.
type RegexHandler ¶
type RegexHandler struct{}
RegexHandler checks if CurrentOutput matches a regex pattern. Params: pattern string.
func (*RegexHandler) Eval ¶
func (h *RegexHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval checks that the current output matches the regex pattern.
func (*RegexHandler) Type ¶
func (h *RegexHandler) Type() string
Type returns the eval type identifier.
type ToolArgsExcludedSessionHandler ¶
type ToolArgsExcludedSessionHandler struct{}
ToolArgsExcludedSessionHandler checks that a tool was NOT called with specific argument values across the session. Params: tool_name string, excluded_args map[string]any.
func (*ToolArgsExcludedSessionHandler) Eval ¶
func (h *ToolArgsExcludedSessionHandler) Eval( ctx context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (_ *evals.EvalResult, _ error)
Eval ensures the tool was never called with excluded args.
func (*ToolArgsExcludedSessionHandler) Type ¶
func (h *ToolArgsExcludedSessionHandler) Type() string
Type returns the eval type identifier.
type ToolArgsHandler ¶
type ToolArgsHandler struct{}
ToolArgsHandler checks that a tool was called with specific args. Params: tool_name string, expected_args map[string]any.
func (*ToolArgsHandler) Eval ¶
func (h *ToolArgsHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval checks that the specified tool was called with matching args.
func (*ToolArgsHandler) Type ¶
func (h *ToolArgsHandler) Type() string
Type returns the eval type identifier.
type ToolArgsSessionHandler ¶
type ToolArgsSessionHandler struct{}
ToolArgsSessionHandler checks that a tool was called with specific arguments across the session. Params: tool_name string, expected_args map[string]any.
func (*ToolArgsSessionHandler) Eval ¶
func (h *ToolArgsSessionHandler) Eval( ctx context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (_ *evals.EvalResult, _ error)
Eval checks tool calls for expected arguments.
func (*ToolArgsSessionHandler) Type ¶
func (h *ToolArgsSessionHandler) Type() string
Type returns the eval type identifier.
type ToolsCalledHandler ¶
type ToolsCalledHandler struct{}
ToolsCalledHandler checks if specific tools were called. Params: tool_names []string, optional min_calls int.
func (*ToolsCalledHandler) Eval ¶
func (h *ToolsCalledHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval checks that all expected tools were called.
func (*ToolsCalledHandler) Type ¶
func (h *ToolsCalledHandler) Type() string
Type returns the eval type identifier.
type ToolsCalledSessionHandler ¶
type ToolsCalledSessionHandler struct{}
ToolsCalledSessionHandler checks that specific tools were called across the full session. Params: tool_names []string, min_calls int (optional, default 1).
func (*ToolsCalledSessionHandler) Eval ¶
func (h *ToolsCalledSessionHandler) Eval( ctx context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (_ *evals.EvalResult, _ error)
Eval checks that all required tools were called at least min_calls times.
func (*ToolsCalledSessionHandler) Type ¶
func (h *ToolsCalledSessionHandler) Type() string
Type returns the eval type identifier.
type ToolsNotCalledHandler ¶
type ToolsNotCalledHandler struct{}
ToolsNotCalledHandler checks that specific tools were NOT called. Params: tool_names []string.
func (*ToolsNotCalledHandler) Eval ¶
func (h *ToolsNotCalledHandler) Eval( _ context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (result *evals.EvalResult, err error)
Eval checks that none of the forbidden tools were called.
func (*ToolsNotCalledHandler) Type ¶
func (h *ToolsNotCalledHandler) Type() string
Type returns the eval type identifier.
type ToolsNotCalledSessionHandler ¶
type ToolsNotCalledSessionHandler struct{}
ToolsNotCalledSessionHandler checks that specific tools were NOT called anywhere in the session. Params: tool_names []string.
func (*ToolsNotCalledSessionHandler) Eval ¶
func (h *ToolsNotCalledSessionHandler) Eval( ctx context.Context, evalCtx *evals.EvalContext, params map[string]any, ) (_ *evals.EvalResult, _ error)
Eval ensures forbidden tools were never called across the session.
func (*ToolsNotCalledSessionHandler) Type ¶
func (h *ToolsNotCalledSessionHandler) Type() string
Type returns the eval type identifier.
Source Files
¶
- contains.go
- contains_any.go
- content_excludes.go
- cosine_similarity.go
- helpers.go
- json_schema.go
- json_valid.go
- judge_provider.go
- latency_budget.go
- llm_judge.go
- llm_judge_session.go
- regex.go
- register.go
- tool_args.go
- tool_args_excluded_session.go
- tool_args_session.go
- tools_called.go
- tools_called_session.go
- tools_not_called.go
- tools_not_called_session.go