Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ConfidenceInterval95 ¶
ConfidenceInterval95 returns the 95% confidence interval (low, high) using the normal approximation (z=1.96). Returns (mean, mean) when fewer than 2 data points are available.
func IsFlaky ¶
IsFlaky returns true when the pass rate is strictly between 0 and 1, meaning the task sometimes passes and sometimes fails.
Types ¶
type BehaviorMetrics ¶
type BehaviorMetrics struct {
ToolCallCount int `json:"tool_call_count"`
IterationCount int `json:"iteration_count"`
MaxToolCallsAllowed int `json:"max_tool_calls_allowed,omitempty"`
MaxToolCallsPassed bool `json:"max_tool_calls_passed"`
MaxIterations int `json:"max_iterations,omitempty"`
MaxIterationsPassed bool `json:"max_iterations_passed"`
MaxResponseTimeMs int64 `json:"max_response_time_ms,omitempty"`
ActualResponseTimeMs int64 `json:"actual_response_time_ms"`
MaxResponseTimeMsPassed bool `json:"max_response_time_ms_passed"`
RequiredTools []string `json:"required_tools,omitempty"`
RequiredToolsUsed []string `json:"required_tools_used,omitempty"`
RequiredToolsMissed []string `json:"required_tools_missed,omitempty"`
RequiredToolsPassed bool `json:"required_tools_passed"`
ForbiddenTools []string `json:"forbidden_tools,omitempty"`
ForbiddenToolsUsed []string `json:"forbidden_tools_used,omitempty"`
ForbiddenToolsPassed bool `json:"forbidden_tools_passed"`
EfficiencyScore float64 `json:"efficiency_score"`
}
BehaviorMetrics captures quality metrics for agent behavior during a run.
func ComputeBehaviorMetrics ¶
func ComputeBehaviorMetrics(run *models.RunResult, rules *models.BehaviorRules) *BehaviorMetrics
ComputeBehaviorMetrics analyzes a RunResult against BehaviorRules and returns quality metrics including compliance checks and an efficiency score.
func (*BehaviorMetrics) AllConstraintsPassed ¶
func (m *BehaviorMetrics) AllConstraintsPassed() bool
AllConstraintsPassed returns true when every behavioral constraint is met.
Click to show internal directories.
Click to hide internal directories.