config

package
v0.13.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 23, 2025 License: MPL-2.0 Imports: 21 Imported by: 0

Documentation

Overview

Package config contains the data models representing the structure of configuration and task definition files for the MindTrial application. It provides configuration management and handles loading and validation of application settings, provider configurations, and task definitions from YAML files.

Index

Constants

View Source
const (
	// OPENAI identifies the OpenAI provider.
	OPENAI string = "openai"
	// GOOGLE identifies the Google AI provider.
	GOOGLE string = "google"
	// ANTHROPIC identifies the Anthropic provider.
	ANTHROPIC string = "anthropic"
	// DEEPSEEK identifies the DeepSeek provider.
	DEEPSEEK string = "deepseek"
	// MISTRALAI identifies the Mistral AI provider.
	MISTRALAI string = "mistralai"
	// XAI identifies the xAI provider.
	XAI string = "xai"
	// ALIBABA identifies the Alibaba provider.
	ALIBABA string = "alibaba"
	// MOONSHOTAI identifies the Moonshot AI provider.
	MOONSHOTAI string = "moonshotai"
)

Variables

View Source
var (
	// ErrInvalidTaskProperty indicates invalid task definition.
	ErrInvalidTaskProperty = errors.New("invalid task property")
	// ErrInvalidURI indicates that the specified URI is invalid or not supported.
	ErrInvalidURI = errors.New("invalid URI")
	// ErrDownloadFile indicates that a remote file could not be downloaded.
	ErrDownloadFile = errors.New("failed to download remote file")
	// ErrAccessFile indicates that a local file could not be accessed.
	ErrAccessFile = errors.New("file is not accessible")
)
View Source
var ErrInvalidConfigProperty = errors.New("invalid configuration property")

ErrInvalidConfigProperty indicates invalid configuration.

Functions

func CleanIfNotBlank

func CleanIfNotBlank(filePath string) string

CleanIfNotBlank cleans the given file path if it's not blank. Returns original path if it's blank.

func IsNotBlank

func IsNotBlank(value string) bool

IsNotBlank returns true if the given string contains non-whitespace characters.

func MakeAbs

func MakeAbs(baseDirPath string, filePath string) string

MakeAbs converts relative file path to absolute using the given base directory. Returns original path if it's already absolute or blank.

func OnceWithContext

func OnceWithContext[S any, T any](f func(context.Context, *S) (T, error)) func(context.Context, *S) (T, error)

OnceWithContext returns a function that invokes f only once regardless of the supplied context. The first call's context is used for execution, and subsequent calls simply return the cached result. This is similar to sync.OnceValues but specifically for functions that need a context.

func ResolveFileNamePattern

func ResolveFileNamePattern(pattern string, timeRef time.Time) string

ResolveFileNamePattern takes a filename pattern containing time placeholders and returns a string with the placeholders replaced by values from the given time reference. Supported placeholders: {{.Year}}, {{.Month}}, {{.Day}}, {{.Hour}}, {{.Minute}}, {{.Second}}. Returns the original pattern if it cannot be resolved.

func ResolveFlagOverride

func ResolveFlagOverride(override *bool, parentValue bool) bool

ResolveFlagOverride returns override value if not nil, otherwise returns parent value.

Types

type AlibabaClientConfig added in v0.10.1

type AlibabaClientConfig struct {
	// APIKey is the API key for the Alibaba provider.
	APIKey string `yaml:"api-key" validate:"required"`
	// Endpoint specifies the network endpoint URL for the API.
	Endpoint string `yaml:"endpoint" validate:"omitempty,url"`
}

AlibabaClientConfig represents Alibaba provider settings.

func (AlibabaClientConfig) GetEndpoint added in v0.10.1

func (c AlibabaClientConfig) GetEndpoint() string

GetEndpoint returns the endpoint URL, defaulting to Singapore endpoint if not specified.

type AlibabaModelParams added in v0.10.1

type AlibabaModelParams struct {
	// TextResponseFormat indicates whether to use plain-text response format
	// for compatibility with models that do not support JSON mode (e.g., when
	// thinking is enabled on certain Qwen models).
	TextResponseFormat bool `yaml:"text-response-format" validate:"omitempty"`

	// Temperature controls the randomness or "creativity" of the model's outputs.
	// Notes: Higher values (e.g. 0.8) make outputs more random; lower values
	// (e.g. 0.2) make outputs more focused and deterministic.
	// Notes: Use either `Temperature` or `TopP`, not both, for sampling control.
	// Valid range: 0.0 — 2.0. Default: 1.0.
	Temperature *float32 `yaml:"temperature" validate:"omitempty,min=0,max=2"`

	// TopP controls diversity via nucleus sampling (probability mass cutoff).
	// Notes: Use either `Temperature` or `TopP`, not both, for sampling control.
	// Valid range: 0.0 — 1.0. Default varies by model.
	TopP *float32 `yaml:"top-p" validate:"omitempty,min=0,max=1"`

	// PresencePenalty penalizes new tokens based on whether they appear in the text so far.
	// Notes: Positive values encourage the model to introduce new topics.
	// Valid range: [-2.0, 2.0]. Default: 0.0.
	PresencePenalty *float32 `yaml:"presence-penalty" validate:"omitempty,min=-2,max=2"`

	// FrequencyPenalty penalizes new tokens based on their frequency in text so far.
	// Notes: Positive values encourage model to use less frequent tokens.
	// Valid range: [-2.0, 2.0]. Default: 0.0.
	FrequencyPenalty *float32 `yaml:"frequency-penalty" validate:"omitempty,min=-2,max=2"`

	// MaxTokens controls the maximum number of tokens available to the model for generating a response.
	MaxTokens *int32 `yaml:"max-tokens" validate:"omitempty,min=0"`

	// Seed makes text generation more deterministic. If specified, the system will
	// attempt to return the same result for the same inputs with the same seed value and parameters.
	Seed *uint32 `yaml:"seed" validate:"omitempty"`

	// DisableLegacyJsonMode toggles a compatibility behavior for certain models.
	// In the legacy mode (default), a standard response format instruction is included
	// in the prompt to guide the model to respond in a structured JSON format.
	// This is necessary for models that do not fully support schema-based structured JSON output.
	DisableLegacyJsonMode *bool `yaml:"disable-legacy-json-mode" validate:"omitempty"`
}

AlibabaModelParams represents Alibaba model-specific settings.

type AnthropicClientConfig

type AnthropicClientConfig struct {
	// APIKey is the API key for the Anthropic generative models provider.
	APIKey string `yaml:"api-key" validate:"required"`
	// RequestTimeout specifies the timeout for API requests.
	RequestTimeout *time.Duration `yaml:"request-timeout" validate:"omitempty"`
}

AnthropicClientConfig represents Anthropic provider settings.

type AnthropicModelParams

type AnthropicModelParams struct {
	// MaxTokens controls the maximum number of tokens available to the model for generating a response.
	// This includes the thinking budget for reasoning models.
	MaxTokens *int64 `yaml:"max-tokens" validate:"omitempty,min=0"`

	// ThinkingBudgetTokens specifies the number of tokens the model can use for its internal reasoning process.
	// It must be at least 1024 and less than `MaxTokens`.
	// If set, this enables enhanced reasoning capabilities for the model.
	ThinkingBudgetTokens *int64 `yaml:"thinking-budget-tokens" validate:"omitempty,min=1024,ltfield=MaxTokens"`

	// Temperature controls the randomness or "creativity" of responses.
	// Values range from 0.0 to 1.0, with lower values making the output more focused.
	// The default value is 1.0.
	// It is generally recommended to alter this or `TopP` but not both.
	Temperature *float64 `yaml:"temperature" validate:"omitempty,min=0,max=1"`

	// TopP controls diversity via nucleus sampling.
	// Values range from 0.0 to 1.0, with lower values making the output more focused.
	// You usually only need to use `Temperature`.
	TopP *float64 `yaml:"top-p" validate:"omitempty,min=0,max=1"`

	// TopK limits response tokens to top K options for each token position.
	// Higher values allow more diverse outputs by considering more token options.
	// You usually only need to use `Temperature`.
	TopK *int64 `yaml:"top-k" validate:"omitempty,min=0"`
}

AnthropicModelParams represents Anthropic model-specific settings.

type AppConfig

type AppConfig struct {
	// LogFile specifies path to the log file.
	LogFile string `yaml:"log-file" validate:"omitempty,filepath"`

	// OutputDir specifies directory where results will be saved.
	OutputDir string `yaml:"output-dir" validate:"required"`

	// OutputBaseName specifies base filename for result files.
	OutputBaseName string `yaml:"output-basename" validate:"omitempty,filepath"`

	// TaskSource specifies path to the task definitions file.
	TaskSource string `yaml:"task-source" validate:"required,filepath"`

	// Providers lists configurations for AI providers whose models will be used
	// to execute tasks during the trial run.
	Providers []ProviderConfig `yaml:"providers" validate:"required,dive"`

	// Judges lists LLM configurations for semantic evaluation of open-ended task responses.
	Judges []JudgeConfig `yaml:"judges" validate:"omitempty,unique=Name,dive"`

	// Tools lists common tool configurations available to tasks.
	Tools []ToolConfig `yaml:"tools" validate:"omitempty,unique=Name,dive"`
}

AppConfig defines application-wide settings.

func (AppConfig) GetJudgesWithEnabledRuns added in v0.5.0

func (ac AppConfig) GetJudgesWithEnabledRuns() []JudgeConfig

GetJudgesWithEnabledRuns returns judges with their enabled run variant configurations. Run variant configurations are resolved using GetRunsResolved before filtering. Any disabled run variant configurations are excluded from the results. Judges with no enabled run variant configurations are excluded from the returned list.

func (AppConfig) GetProvidersWithEnabledRuns

func (ac AppConfig) GetProvidersWithEnabledRuns() []ProviderConfig

GetProvidersWithEnabledRuns returns providers with their enabled run configurations. Run configurations are resolved using GetRunsResolved before filtering. Any disabled run configurations are excluded from the results. Providers with no enabled run configurations are excluded from the returned list.

type ClientConfig

type ClientConfig interface{}

ClientConfig is a marker interface for provider-specific configurations.

type Config

type Config struct {
	// Config contains application-wide settings.
	Config AppConfig `yaml:"config" validate:"required"`
}

Config represents the top-level configuration structure.

func LoadConfigFromFile

func LoadConfigFromFile(ctx context.Context, path string) (*Config, error)

LoadConfigFromFile reads and validates application configuration from the specified file path. Returns error if the file cannot be read or contains invalid configuration.

type DeepseekClientConfig

type DeepseekClientConfig struct {
	// APIKey is the API key for the DeepSeek generative models provider.
	APIKey string `yaml:"api-key" validate:"required"`
	// RequestTimeout specifies the timeout for API requests.
	RequestTimeout *time.Duration `yaml:"request-timeout" validate:"omitempty"`
}

DeepseekClientConfig represents DeepSeek provider settings.

type DeepseekModelParams

type DeepseekModelParams struct {
	// Temperature controls the randomness or "creativity" of the model's outputs.
	// Values range from 0.0 to 2.0, with lower values making the output more focused.
	// The default value is 1.0.
	// Recommended values by use case:
	// - 0.0: Coding / Math (best for precise, deterministic outputs)
	// - 1.0: Data Cleaning / Data Analysis
	// - 1.3: General Conversation / Translation
	// - 1.5: Creative Writing / Poetry (more varied and creative outputs)
	Temperature *float32 `yaml:"temperature" validate:"omitempty,min=0,max=2"`

	// TopP controls diversity via nucleus sampling.
	// Values range from 0.0 to 1.0, with lower values making the output more focused.
	// You usually only need to use `Temperature`.
	TopP *float32 `yaml:"top-p" validate:"omitempty,min=0,max=1"`

	// PresencePenalty penalizes new tokens based on whether they appear in the text so far.
	// Values range from -2.0 to 2.0, with positive values encouraging the model to use new tokens,
	// increasing the model's likelihood to talk about new topics.
	// The default value is 0.0.
	PresencePenalty *float32 `yaml:"presence-penalty" validate:"omitempty,min=-2,max=2"`

	// FrequencyPenalty penalizes new tokens based on their frequency in the text so far.
	// Values range from -2.0 to 2.0, with positive values encouraging the model to use less frequent tokens,
	// decreasing the model's likelihood to repeat the same line verbatim.
	// The default value is 0.0.
	FrequencyPenalty *float32 `yaml:"frequency-penalty" validate:"omitempty,min=-2,max=2"`
}

DeepseekModelParams represents DeepSeek model-specific settings.

type GoogleAIClientConfig

type GoogleAIClientConfig struct {
	// APIKey is the API key for the Google AI generative models provider.
	APIKey string `yaml:"api-key" validate:"required"`
}

GoogleAIClientConfig represents Google AI provider settings.

type GoogleAIModelParams

type GoogleAIModelParams struct {
	// TextResponseFormat indicates whether to use plain-text response format
	// for compatibility with models that do not support JSON.
	// This setting applies to all tasks, including those with and without tools enabled.
	TextResponseFormat bool `yaml:"text-response-format" validate:"omitempty"`

	// TextResponseFormatWithTools forces plain-text response format when tools are enabled.
	// If true, forces plain-text mode when tools are used (required for pre-Gemini 3 models).
	// If false or unset, uses JSON schema mode with tools (Gemini 3+ default behavior).
	// This setting only applies to tasks with tools enabled.
	TextResponseFormatWithTools bool `yaml:"text-response-format-with-tools" validate:"omitempty"`

	// ThinkingLevel controls the maximum depth of the model's internal reasoning process.
	// Valid values: "low", "high". Gemini 3 Pro defaults to "high" if not specified.
	// - "low": Minimizes latency and cost, best for simple instruction following
	// - "high": Maximizes reasoning depth, the model may take longer but output is more carefully reasoned
	ThinkingLevel *string `yaml:"thinking-level" validate:"omitempty,oneof=low high"`

	// MediaResolution controls the maximum number of tokens allocated per input image or video frame.
	// Valid values: "low", "medium", "high". Higher resolutions improve fine text reading and small detail
	// identification but increase token usage and latency.
	// - "low": 280 tokens for images, 70 tokens per video frame
	// - "medium": 560 tokens for images, 70 tokens per video frame (same as low for video)
	// - "high": 1120 tokens for images, 280 tokens per video frame
	// If unspecified, the model uses optimal defaults based on media type.
	MediaResolution *string `yaml:"media-resolution" validate:"omitempty,oneof=low medium high"`

	// Temperature controls the randomness or "creativity" of the model's outputs.
	// Values range from 0.0 to 2.0, with lower values making the output more focused and deterministic.
	// The default value is typically around 1.0.
	Temperature *float32 `yaml:"temperature" validate:"omitempty,min=0,max=2"`

	// TopP controls diversity via nucleus sampling.
	// Values range from 0.0 to 1.0, with lower values making the output more focused.
	// The default value is typically around 1.0.
	TopP *float32 `yaml:"top-p" validate:"omitempty,min=0,max=1"`

	// TopK limits response tokens to top K options for each token position.
	// Higher values allow more diverse outputs by considering more token options.
	TopK *int32 `yaml:"top-k" validate:"omitempty,min=0"`

	// PresencePenalty penalizes new tokens based on whether they appear in the text so far.
	// Positive values discourage the use of tokens that have already been used in the response,
	// increasing the vocabulary. Negative values encourage the use of tokens that have already been used.
	// This penalty is binary on/off and not dependent on the number of times the token is used.
	PresencePenalty *float32 `yaml:"presence-penalty" validate:"omitempty"`

	// FrequencyPenalty penalizes new tokens based on their frequency in the text so far.
	// Positive values discourage the use of tokens that have already been used, proportional to
	// the number of times the token has been used. Negative values encourage the model to reuse tokens.
	// This differs from PresencePenalty as it scales with frequency.
	FrequencyPenalty *float32 `yaml:"frequency-penalty" validate:"omitempty"`

	// Seed is used for deterministic generation. When set to a specific value, the model
	// makes a best effort to provide the same response for repeated requests.
	// If not set, a randomly generated seed is used.
	Seed *int32 `yaml:"seed" validate:"omitempty"`
}

GoogleAIModelParams represents Google AI model-specific settings.

type JudgeConfig added in v0.5.0

type JudgeConfig struct {
	// Name is the unique identifier for this judge configuration.
	Name string `yaml:"name" validate:"required"`

	// Provider encapsulates the provider configuration for the judge.
	Provider ProviderConfig `yaml:"provider" validate:"required"`
}

JudgeConfig defines configuration for an LLM judge used for semantic evaluation of complex open-ended task responses. Judges analyze the meaning and quality of answers rather than performing exact text matching, enabling evaluation of subjective or creative tasks where multiple valid interpretations exist.

func (JudgeConfig) Resolve added in v0.5.0

func (jc JudgeConfig) Resolve(excludeDisabledRuns bool) JudgeConfig

Resolve returns a copy of the judge configuration with run variants resolved. If excludeDisabledRuns is true, only enabled run variants are included.

type JudgePrompt added in v0.10.0

type JudgePrompt struct {
	// Template is the template string for the judge prompt.
	Template *string `yaml:"template" validate:"omitempty"`

	// VerdictFormat specifies how the judge should format its evaluation response.
	VerdictFormat *ResponseFormat `yaml:"verdict-format" validate:"omitempty"`

	// PassingVerdicts is the set of verdicts that count as a pass.
	PassingVerdicts *utils.ValueSet `yaml:"passing-verdicts" validate:"omitempty"`
	// contains filtered or unexported fields
}

JudgePrompt represents a judge prompt configuration for semantic validation.

func (*JudgePrompt) CompileJudgeTemplate added in v0.10.0

func (jp *JudgePrompt) CompileJudgeTemplate() error

CompileJudgeTemplate compiles the judge prompt template if it exists.

func (JudgePrompt) GetPassingVerdicts added in v0.10.0

func (jp JudgePrompt) GetPassingVerdicts() utils.ValueSet

GetPassingVerdicts returns the accepted verdicts set.

func (JudgePrompt) GetTemplate added in v0.10.0

func (jp JudgePrompt) GetTemplate() (template string, ok bool)

GetTemplate returns the template string and true if it is set and not blank.

func (JudgePrompt) GetVerdictFormat added in v0.10.0

func (jp JudgePrompt) GetVerdictFormat() ResponseFormat

GetResponseFormat returns the response format.

func (JudgePrompt) MergeWith added in v0.10.0

func (these JudgePrompt) MergeWith(other JudgePrompt) JudgePrompt

MergeWith merges this judge prompt with another and returns the result. The provided other values override these values if set.

func (JudgePrompt) ResolveJudgePrompt added in v0.10.0

func (jp JudgePrompt) ResolveJudgePrompt(data interface{}) (string, error)

ResolveJudgePrompt resolves the judge prompt template with the provided data. Returns the resolved prompt string and an error if the template execution fails. If no custom template is provided, uses the default judge prompt template.

type JudgeSelector added in v0.5.0

type JudgeSelector struct {
	// Enabled determines whether judge evaluation is enabled.
	Enabled *bool `yaml:"enabled" validate:"omitempty"`

	// Name specifies the name of the judge configuration to use.
	Name *string `yaml:"name" validate:"omitempty"`

	// Variant specifies the run variant name from the judge's provider configuration.
	Variant *string `yaml:"variant" validate:"omitempty"`

	// Prompt specifies the judge prompt configuration.
	Prompt JudgePrompt `yaml:"prompt" validate:"omitempty"`
}

JudgeSelector defines settings for using a judge in validation.

func (JudgeSelector) GetName added in v0.5.0

func (js JudgeSelector) GetName() (name string)

GetName returns the judge name, or empty string if not set.

func (JudgeSelector) GetVariant added in v0.5.0

func (js JudgeSelector) GetVariant() (variant string)

GetVariant returns the judge run variant, or empty string if not set.

func (JudgeSelector) IsEnabled added in v0.5.0

func (js JudgeSelector) IsEnabled() bool

IsEnabled returns whether judge evaluation is enabled.

func (JudgeSelector) MergeWith added in v0.5.0

func (these JudgeSelector) MergeWith(other JudgeSelector) JudgeSelector

MergeWith merges this judge configuration with another and returns the result. The provided other values override these values if set.

type LegacyJsonMode added in v0.12.2

type LegacyJsonMode int

LegacyJsonMode specifies the compatibility mode for JSON response formatting.

const (
	// LegacyJsonSchema adds a text-based JSON format instruction to the prompt
	// while keeping the json_schema response format with strict schema validation.
	// Use this mode for providers that require explicit JSON formatting guidance
	// in the prompt but support structured schema-based responses (e.g., Alibaba Qwen models).
	LegacyJsonSchema LegacyJsonMode = iota

	// LegacyJsonObject adds a text-based JSON format instruction to the prompt
	// and switches the response format to json_object mode without schema validation.
	// Use this mode for providers that only support basic JSON object responses
	// (e.g., Moonshot AI Kimi models).
	LegacyJsonObject
)

func (LegacyJsonMode) Ptr added in v0.12.2

func (l LegacyJsonMode) Ptr() *LegacyJsonMode

Ptr returns a pointer to the LegacyJsonMode value.

type MistralAIClientConfig added in v0.4.0

type MistralAIClientConfig struct {
	// APIKey is the API key for the Mistral AI generative models provider.
	APIKey string `yaml:"api-key" validate:"required"`
}

MistralAIClientConfig represents Mistral AI provider settings.

type MistralAIModelParams added in v0.4.0

type MistralAIModelParams struct {
	// Temperature controls the randomness or "creativity" of the model's outputs.
	// Values range from 0.0 to 1.5, with lower values making the output more focused and deterministic.
	// The default value varies depending on the model.
	// It is generally recommended to alter this or `TopP` but not both.
	Temperature *float32 `yaml:"temperature" validate:"omitempty,min=0,max=1.5"`

	// TopP controls diversity via nucleus sampling.
	// Values range from 0.0 to 1.0, with lower values making the output more focused.
	// The default value is 1.0.
	// It is generally recommended to alter this or `Temperature` but not both.
	TopP *float32 `yaml:"top-p" validate:"omitempty,min=0,max=1"`

	// MaxTokens controls the maximum number of tokens to generate in the completion.
	// The token count of the prompt plus max_tokens cannot exceed the model's context length.
	MaxTokens *int32 `yaml:"max-tokens" validate:"omitempty,min=0"`

	// PresencePenalty penalizes new tokens based on whether they appear in the text so far.
	// Values range from -2.0 to 2.0, with positive values encouraging the model to use new tokens,
	// increasing the model's likelihood to talk about new topics.
	// The default value is 0.0.
	PresencePenalty *float32 `yaml:"presence-penalty" validate:"omitempty,min=-2,max=2"`

	// FrequencyPenalty penalizes new tokens based on their frequency in the text so far.
	// Values range from -2.0 to 2.0, with positive values encouraging the model to use less frequent tokens,
	// decreasing the model's likelihood to repeat the same line verbatim.
	// The default value is 0.0.
	FrequencyPenalty *float32 `yaml:"frequency-penalty" validate:"omitempty,min=-2,max=2"`

	// RandomSeed provides the seed to use for random sampling.
	// If set, requests will generate deterministic results.
	RandomSeed *int32 `yaml:"random-seed" validate:"omitempty"`

	// PromptMode sets the prompt mode for the request.
	// When set to "reasoning", a system prompt will be used to instruct the model to reason if supported.
	PromptMode *string `yaml:"prompt-mode" validate:"omitempty,oneof=reasoning"`

	// SafePrompt controls whether to inject a safety prompt before all conversations.
	SafePrompt *bool `yaml:"safe-prompt" validate:"omitempty"`
}

MistralAIModelParams represents Mistral AI model-specific settings.

type ModelParams

type ModelParams interface{}

ModelParams is a marker interface for model-specific parameters.

type MoonshotAIClientConfig added in v0.12.2

type MoonshotAIClientConfig struct {
	// APIKey is the API key for the Moonshot AI provider.
	APIKey string `yaml:"api-key" validate:"required"`
	// Endpoint specifies the network endpoint URL for the API.
	Endpoint string `yaml:"endpoint" validate:"omitempty,url"`
}

MoonshotAIClientConfig represents Moonshot AI provider settings.

func (MoonshotAIClientConfig) GetEndpoint added in v0.12.2

func (c MoonshotAIClientConfig) GetEndpoint() string

GetEndpoint returns the endpoint URL for Moonshot AI, defaulting to the public API base when not specified.

type MoonshotAIModelParams added in v0.12.2

type MoonshotAIModelParams struct {
	// Temperature controls the randomness or "creativity" of the model's outputs.
	// Values range from 0.0 to 1.0, with lower values making the output more focused and deterministic.
	// The default value is 0.0.
	// Moonshot AI recommends 0.6 for kimi-k2 models and 1.0 for kimi-k2-thinking models.
	// It is generally recommended to alter this or `TopP` but not both.
	Temperature *float32 `yaml:"temperature" validate:"omitempty,min=0,max=1"`

	// TopP controls diversity via nucleus sampling.
	// Values range from 0.0 to 1.0, with lower values making the output more focused.
	// The default value is 1.0.
	// It is generally recommended to alter this or `Temperature` but not both.
	TopP *float32 `yaml:"top-p" validate:"omitempty,min=0,max=1"`

	// MaxTokens controls the maximum number of tokens available to the model for generating a response.
	MaxTokens *int32 `yaml:"max-tokens" validate:"omitempty,min=0"`

	// PresencePenalty penalizes new tokens based on whether they appear in the text so far.
	// Values range from -2.0 to 2.0, with positive values encouraging the model to use new tokens,
	// increasing the model's likelihood to talk about new topics.
	// The default value is 0.0.
	PresencePenalty *float32 `yaml:"presence-penalty" validate:"omitempty,min=-2,max=2"`

	// FrequencyPenalty penalizes new tokens based on their frequency in the text so far.
	// Values range from -2.0 to 2.0, with positive values encouraging the model to use less frequent tokens,
	// decreasing the model's likelihood to repeat the same line verbatim.
	// The default value is 0.0.
	FrequencyPenalty *float32 `yaml:"frequency-penalty" validate:"omitempty,min=-2,max=2"`
}

MoonshotAIModelParams represents Moonshot AI model-specific settings.

type OpenAIClientConfig

type OpenAIClientConfig struct {
	// APIKey is the API key for the OpenAI provider.
	APIKey string `yaml:"api-key" validate:"required"`
}

OpenAIClientConfig represents OpenAI provider settings.

type OpenAIModelParams

type OpenAIModelParams struct {
	// ReasoningEffort controls effort level on reasoning for reasoning models.
	// Valid values are: "none", "minimal", "low", "medium", "high", "xhigh".
	ReasoningEffort *string `yaml:"reasoning-effort" validate:"omitempty,oneof=none minimal low medium high xhigh"`

	// Verbosity determines how many output tokens are generated.
	// Valid values are: "low", "medium", "high".
	// Note: May not be supported by legacy models.
	Verbosity *string `yaml:"verbosity" validate:"omitempty,oneof=low medium high"`

	// TextResponseFormat indicates whether to use plain-text response format
	// for compatibility with models that do not support JSON.
	TextResponseFormat bool `yaml:"text-response-format" validate:"omitempty"`

	// Temperature controls the randomness or "creativity" of the model's outputs.
	// Values range from 0.0 to 2.0, with lower values making the output more focused and deterministic.
	// The default value is 1.0.
	// It is generally recommended to alter this or `TopP` but not both.
	Temperature *float32 `yaml:"temperature" validate:"omitempty,min=0,max=2"`

	// TopP controls diversity via nucleus sampling.
	// Values range from 0.0 to 1.0, with lower values making the output more focused.
	// The default value is 1.0.
	// It is generally recommended to alter this or `Temperature` but not both.
	TopP *float32 `yaml:"top-p" validate:"omitempty,min=0,max=1"`

	// PresencePenalty penalizes new tokens based on whether they appear in the text so far.
	// Values range from -2.0 to 2.0, with positive values encouraging the model to use new tokens,
	// increasing the model's likelihood to talk about new topics.
	// The default value is 0.0.
	PresencePenalty *float32 `yaml:"presence-penalty" validate:"omitempty,min=-2,max=2"`

	// FrequencyPenalty penalizes new tokens based on their frequency in the text so far.
	// Values range from -2.0 to 2.0, with positive values encouraging the model to use less frequent tokens,
	// decreasing the model's likelihood to repeat the same line verbatim.
	// The default value is 0.0.
	FrequencyPenalty *float32 `yaml:"frequency-penalty" validate:"omitempty,min=-2,max=2"`

	// MaxCompletionTokens controls the maximum number of tokens available to the model for generating a response,
	// including visible output tokens and reasoning tokens.
	MaxCompletionTokens *int32 `yaml:"max-completion-tokens" validate:"omitempty,min=1"`

	// MaxTokens controls the maximum number of tokens available to the model for generating a response.
	// This field is for internal use only and not exposed in YAML configuration.
	//
	// Deprecated: Use `MaxCompletionTokens` instead for user configuration.
	MaxTokens *int32 `yaml:"-"`

	// Seed makes text generation more deterministic. If specified, the system will
	// attempt to return the same result for the same inputs with the same seed value and parameters.
	// This field is for internal use only and not exposed in YAML configuration.
	Seed *int64 `yaml:"-"`

	// LegacyJsonMode specifies a compatibility mode for JSON response formatting.
	// When set to LegacyJsonSchema, adds format instruction to prompt while keeping json_schema response format.
	// When set to LegacyJsonObject, adds format instruction to prompt and uses json_object response format.
	// When nil, uses default behavior (no format instruction, json_schema response format).
	// This field is for internal use only and not exposed in YAML configuration.
	LegacyJsonMode *LegacyJsonMode `yaml:"-"`
}

OpenAIModelParams represents OpenAI model-specific settings.

type ProviderConfig

type ProviderConfig struct {
	// Name specifies unique identifier of the provider.
	Name string `yaml:"name" validate:"required,oneof=openai google anthropic deepseek mistralai xai alibaba moonshotai"`

	// ClientConfig holds provider-specific client settings.
	ClientConfig ClientConfig `yaml:"client-config" validate:"required"`

	// Runs lists run configurations for this provider.
	Runs []RunConfig `yaml:"runs" validate:"required,unique=Name,dive"`

	// Disabled indicates if all runs should be disabled by default.
	Disabled bool `yaml:"disabled" validate:"omitempty"`

	// RetryPolicy specifies default retry behavior for all runs in this provider.
	RetryPolicy RetryPolicy `yaml:"retry-policy" validate:"omitempty"`
}

ProviderConfig defines settings for an AI provider.

func (ProviderConfig) GetRunsResolved added in v0.4.0

func (pc ProviderConfig) GetRunsResolved() []RunConfig

GetRunsResolved returns runs with retry policies and disabled flags resolved. If RunConfig.RetryPolicy is nil, the parent ProviderConfig.RetryPolicy value is used instead. If RunConfig.Disabled is nil, the parent ProviderConfig.Disabled value is used instead.

func (ProviderConfig) Resolve added in v0.5.0

func (pc ProviderConfig) Resolve(excludeDisabledRuns bool) ProviderConfig

Resolve returns a copy of the provider configuration with runs resolved. If excludeDisabledRuns is true, only enabled runs are included.

func (*ProviderConfig) UnmarshalYAML

func (pc *ProviderConfig) UnmarshalYAML(value *yaml.Node) error

UnmarshalYAML implements custom YAML unmarshaling for ProviderConfig. It handles provider-specific client configuration based on provider name.

type ResponseFormat added in v0.9.0

type ResponseFormat struct {
	// contains filtered or unexported fields
}

ResponseFormat represents the expected format of the AI model's response. It specifies how the model should structure its answer, either as a plain text format instruction or a JSON schema object for structured responses.

func NewResponseFormat added in v0.9.0

func NewResponseFormat(value interface{}) ResponseFormat

NewResponseFormat creates a ResponseFormat from an instruction string or schema object.

func (ResponseFormat) AsSchema added in v0.9.0

func (r ResponseFormat) AsSchema() (schema map[string]interface{}, ok bool)

AsSchema returns the JSON schema object if this is a schema format. Returns (schema, true) if this is a schema format.

func (ResponseFormat) AsString added in v0.9.0

func (r ResponseFormat) AsString() (value string, ok bool)

AsString returns the string instruction if this is a string format. Returns (value, true) if this is a string format.

func (ResponseFormat) MarshalYAML added in v0.9.0

func (r ResponseFormat) MarshalYAML() (interface{}, error)

MarshalYAML implements custom YAML marshaling for ResponseFormat.

func (*ResponseFormat) UnmarshalYAML added in v0.9.0

func (r *ResponseFormat) UnmarshalYAML(value *yaml.Node) error

UnmarshalYAML implements custom YAML unmarshaling for ResponseFormat.

type RetryPolicy added in v0.4.0

type RetryPolicy struct {
	// MaxRetryAttempts specifies the maximum number of retry attempts.
	// Value of 0 means no retry attempts will be made.
	MaxRetryAttempts uint `yaml:"max-retry-attempts" validate:"omitempty,min=0"`

	// InitialDelaySeconds specifies the initial delay in seconds before the first retry attempt.
	InitialDelaySeconds int `yaml:"initial-delay-seconds" validate:"omitempty,gt=0"`
}

RetryPolicy defines retry behavior on transient errors.

type RunConfig

type RunConfig struct {
	// Name is a display-friendly identifier shown in results.
	Name string `yaml:"name" validate:"required"`

	// Model specifies target model's identifier.
	Model string `yaml:"model" validate:"required"`

	// MaxRequestsPerMinute limits the number of API requests per minute sent to this specific model.
	// Value of 0 means no rate limiting will be applied.
	MaxRequestsPerMinute int `yaml:"max-requests-per-minute" validate:"omitempty,numeric,min=0"`

	// Disabled indicates if this run configuration should be skipped.
	// If set, overrides the parent ProviderConfig.Disabled value.
	Disabled *bool `yaml:"disabled" validate:"omitempty"`

	// ModelParams holds any model-specific configuration parameters.
	ModelParams ModelParams `yaml:"model-parameters" validate:"omitempty"`

	// RetryPolicy specifies retry behavior on transient errors.
	// If set, overrides the parent ProviderConfig.RetryPolicy value.
	RetryPolicy *RetryPolicy `yaml:"retry-policy" validate:"omitempty"`
}

RunConfig defines settings for a single run configuration.

type SystemPrompt added in v0.8.0

type SystemPrompt struct {
	// Template is the template string for the system prompt.
	// It can reference `{{.ResponseResultFormat}}` to include the task's response format.
	Template *string `yaml:"template" validate:"omitempty"`

	// EnableFor controls when system prompt should be sent to AI models.
	// - "all": system prompt is sent for all tasks
	// - "text": system prompt is sent only for tasks with plain text response format
	// - "none": system prompt is never sent
	// Defaults to "text" when not specified.
	EnableFor *SystemPromptEnabledFor `yaml:"enable-for" validate:"omitempty,oneof=all text none"`
}

SystemPrompt represents a system prompt configuration.

func (SystemPrompt) GetEnableFor added in v0.9.0

func (s SystemPrompt) GetEnableFor() SystemPromptEnabledFor

GetEnableFor returns the EnableFor value, defaulting to EnableForText if not set.

func (SystemPrompt) GetTemplate added in v0.8.0

func (s SystemPrompt) GetTemplate() (template string, ok bool)

GetTemplate returns the template string and true if it is set and not blank.

func (SystemPrompt) MergeWith added in v0.8.0

func (these SystemPrompt) MergeWith(other *SystemPrompt) SystemPrompt

MergeWith merges this system prompt with another and returns the result. The provided other values override these values if set.

type SystemPromptEnabledFor added in v0.9.0

type SystemPromptEnabledFor string

SystemPromptEnabledFor represents the enabled state for system prompt.

const (
	// EnableForAll enables system prompt for all tasks.
	EnableForAll SystemPromptEnabledFor = "all"
	// EnableForText enables system prompt only for tasks with plain text response format.
	EnableForText SystemPromptEnabledFor = "text"
	// EnableForNone disables system prompt for all tasks.
	EnableForNone SystemPromptEnabledFor = "none"
)

SystemPromptEnabledFor constants define when system prompt should be sent.

type Task

type Task struct {
	// Name is a display-friendly identifier shown in results.
	Name string `yaml:"name" validate:"required"`

	// Prompt that will be sent to the AI model.
	Prompt string `yaml:"prompt" validate:"required"`

	// ResponseResultFormat specifies how the AI should format the final answer to the prompt.
	// Can be either a plain text instruction or a JSON schema object.
	ResponseResultFormat ResponseFormat `yaml:"response-result-format" validate:"required"`

	// ExpectedResult is the set of accepted valid answers for the prompt.
	// For plain text format: contains string values that must follow the `ResponseResultFormat` instruction precisely.
	// For structured schema format: contains object values that must be valid according to the `ResponseResultFormat` schema.
	// Only one needs to match for the response to be considered correct.
	ExpectedResult utils.ValueSet `yaml:"expected-result" validate:"required"`

	// Disabled indicates whether this specific task should be skipped.
	// If set, overrides the global TaskConfig.Disabled value.
	Disabled *bool `yaml:"disabled" validate:"omitempty"`

	// ValidationRules are validation settings for this specific task.
	// If set, overrides the global TaskConfig.ValidationRules values.
	ValidationRules *ValidationRules `yaml:"validation-rules" validate:"omitempty"`

	// SystemPrompt is the system prompt configuration for this specific task.
	// If set, overrides the global TaskConfig.SystemPrompt values.
	SystemPrompt *SystemPrompt `yaml:"system-prompt" validate:"omitempty"`

	// Files is a list of files to be included with the prompt.
	// This is primarily used for images but can support other file types
	// depending on the provider's capabilities.
	Files []TaskFile `yaml:"files" validate:"omitempty,unique=Name,dive"`

	// ToolSelector is the tool selector configuration for this specific task.
	// If set, overrides the global TaskConfig.ToolSelector values.
	ToolSelector *ToolSelector `yaml:"tool-selector" validate:"omitempty"`
	// contains filtered or unexported fields
}

Task defines a single test case to be executed by AI models.

func (Task) GetResolvedSystemPrompt added in v0.8.0

func (t Task) GetResolvedSystemPrompt() (prompt string, ok bool)

GetResolvedSystemPrompt returns the resolved system prompt template for this task and true if it is not blank.

func (Task) GetResolvedToolSelector added in v0.11.0

func (t Task) GetResolvedToolSelector() ToolSelector

GetResolvedToolSelector returns the resolved tool selector for this task.

func (Task) GetResolvedValidationRules added in v0.10.0

func (t Task) GetResolvedValidationRules() ValidationRules

GetResolvedValidationRules returns the resolved validation rules for this task.

func (*Task) ResolveSystemPrompt added in v0.8.0

func (t *Task) ResolveSystemPrompt(defaultConfig SystemPrompt) error

ResolveSystemPrompt resolves the system prompt template for this task using the provided default. The resolved template can be retrieved using GetResolvedSystemPrompt().

func (*Task) ResolveToolSelector added in v0.11.0

func (t *Task) ResolveToolSelector(defaultSelector ToolSelector)

ResolveToolSelector resolves the tool selector for this task using the provided default. The resolved selector can be retrieved using GetResolvedToolSelector().

func (*Task) ResolveValidationRules added in v0.10.0

func (t *Task) ResolveValidationRules(defaultConfig ValidationRules) error

ResolveValidationRules resolves the validation rules for this task using the provided default. The resolved rules can be retrieved using GetResolvedValidationRules().

func (*Task) SetBaseFilePath

func (t *Task) SetBaseFilePath(basePath string) error

SetBaseFilePath sets the base path for all local files in the task. The resolved paths are validated to ensure they are accessible.

type TaskConfig

type TaskConfig struct {
	// Tasks is a list of tasks to be executed.
	Tasks []Task `yaml:"tasks" validate:"required,unique=Name,dive"`

	// Disabled indicates whether all tasks should be disabled by default.
	// Individual tasks can override this setting.
	Disabled bool `yaml:"disabled" validate:"omitempty"`

	// ValidationRules are default validation settings for all tasks.
	// Individual tasks can override these settings.
	ValidationRules ValidationRules `yaml:"validation-rules" validate:"omitempty"`

	// SystemPrompt is the default system prompt configuration for all tasks.
	// Individual tasks can override this configuration.
	SystemPrompt SystemPrompt `yaml:"system-prompt" validate:"omitempty"`

	// ToolSelector is the default tool selector configuration for all tasks.
	// Individual tasks can override this configuration.
	ToolSelector ToolSelector `yaml:"tool-selector" validate:"omitempty"`
}

TaskConfig represents task definitions and global settings.

func (TaskConfig) GetEnabledTasks

func (o TaskConfig) GetEnabledTasks() []Task

GetEnabledTasks returns a filtered list of tasks that are not disabled. If Task.Disabled is nil, the global TaskConfig.Disabled value is used instead.

func (TaskConfig) Validate added in v0.9.0

func (o TaskConfig) Validate() error

Validate validates all tasks for internal consistency. Returns an error if any task has incompatible configuration.

type TaskFile

type TaskFile struct {
	// Name is a unique identifier for the file, used to reference it in prompts.
	Name string `yaml:"name" validate:"required"`

	// URI is the path or URL to the file.
	URI URI `yaml:"uri" validate:"required"`

	// Type is the MIME type of the file.
	// If not provided, it will be inferred from the file extension or content.
	Type string `yaml:"type" validate:"omitempty"`
	// contains filtered or unexported fields
}

TaskFile represents a file to be included with a task.

func (*TaskFile) Base64

func (f *TaskFile) Base64(ctx context.Context) (string, error)

Base64 returns the base64-encoded file content, loading it on demand.

func (*TaskFile) Content

func (f *TaskFile) Content(ctx context.Context) ([]byte, error)

Content returns the raw file content, loading it on demand.

func (*TaskFile) GetDataURL

func (f *TaskFile) GetDataURL(ctx context.Context) (string, error)

GetDataURL returns a complete data URL for the file (e.g., "data:image/png;base64,...").

func (*TaskFile) SetBasePath

func (f *TaskFile) SetBasePath(basePath string)

SetBasePath sets the base path used to resolve relative local paths.

func (*TaskFile) TypeValue

func (f *TaskFile) TypeValue(ctx context.Context) (string, error)

TypeValue returns the MIME type, inferring it if not set, loading content if needed.

func (*TaskFile) UnmarshalYAML

func (f *TaskFile) UnmarshalYAML(value *yaml.Node) error

UnmarshalYAML implements custom YAML unmarshaling for TaskFile.

func (*TaskFile) Validate

func (f *TaskFile) Validate() error

Validate checks if a local file exists, is accessible, and is not a directory. Remote files are not validated as they will be checked when accessed.

type Tasks

type Tasks struct {
	// TaskConfig contains all task definitions and settings.
	TaskConfig TaskConfig `yaml:"task-config" validate:"required"`
}

Tasks represents the top-level task configuration structure.

func LoadTasksFromFile

func LoadTasksFromFile(ctx context.Context, path string) (*Tasks, error)

LoadTasksFromFile reads and validates task definitions from the specified file path. Returns error if the file cannot be read or contains invalid task definitions.

type ToolConfig added in v0.11.0

type ToolConfig struct {
	// Name is the unique identifier for the tool.
	Name string `yaml:"name" validate:"required"`
	// Image is the name of the Docker image to use for the tool.
	Image string `yaml:"image" validate:"required"`
	// Description describes what the tool does. For optimal LLM understanding and tool selection,
	// provide extremely detailed descriptions including:
	// - What the tool does and its primary purpose
	// - When it should be used (and when it shouldn't)
	// - What each parameter in the schema means and how it affects behavior
	// - Any important caveats, limitations, or side effects
	// - Examples of usage if helpful
	// Aim for 3-4 sentences per tool description. Be specific and avoid ambiguity
	// to help the LLM choose the correct tool and provide appropriate parameters.
	Description string `yaml:"description" validate:"required"`
	// Parameters is the JSON schema for the tool's input parameters. Follow these best practices
	// to improve LLM parameter generation accuracy:
	// - Use standard JSON Schema format with detailed "description" fields for each parameter
	// - Specify precise types (string, integer, boolean, array, object)
	// - Use "enum" arrays for parameters with fixed sets of allowed values
	// - Include examples and constraints in parameter descriptions (e.g., "The city name, e.g., 'San Francisco'")
	// - Clearly mark all required parameters in the "required" array
	// - Use "additionalProperties": false for objects to prevent unexpected parameters
	// - Provide comprehensive descriptions that explain parameter purpose and format
	Parameters map[string]interface{} `yaml:"parameters" validate:"required"`
	// ParameterFiles maps parameter field names to file paths where argument values should be written.
	// This allows passing large or complex data to tools via files instead of inline JSON.
	// The tool's command should read these files as needed.
	ParameterFiles map[string]string `yaml:"parameter-files,omitempty"`
	// AuxiliaryDir specifies the directory path where task files will be automatically available.
	// If set, all files attached to a task will be copied to this directory using each file's
	// `TaskFile.Name` exactly as provided.
	// This directory is ephemeral: files are reset between tool calls and do not persist
	// across multiple invocations.
	AuxiliaryDir string `yaml:"auxiliary-dir,omitempty"`
	// SharedDir specifies the directory path that persists across all tool calls within a single task.
	// If set, files created in this directory will be available for any subsequent tool calls but
	// will be removed when the task completes.
	SharedDir string `yaml:"shared-dir,omitempty"`
	// Command specifies the command to execute as a list of its components.
	Command []string `yaml:"command,omitempty"`
	// Env specifies additional environment variables to set.
	Env map[string]string `yaml:"env,omitempty"`
}

ToolConfig represents the configuration for a tool.

type ToolSelection added in v0.11.0

type ToolSelection struct {
	// Name of the tool to select.
	Name string `yaml:"name" validate:"required"`
	// Disabled determines whether this specific tool is disabled.
	// If nil, uses the value from the ToolSelector.
	Disabled *bool `yaml:"disabled" validate:"omitempty"`
	// MaxCalls is the maximum number of times this tool can be called per task.
	// If nil, there is no limit.
	MaxCalls *int `yaml:"max-calls" validate:"omitempty,min=1"`
	// Timeout is the timeout for a single tool invocation.
	// If nil, there is no timeout.
	Timeout *time.Duration `yaml:"timeout" validate:"omitempty"`
	// MaxMemoryMB is the maximum memory limit in MB available for this tool per invocation.
	// If nil, there is no memory limit.
	MaxMemoryMB *int `yaml:"max-memory-mb" validate:"omitempty,min=1"`
	// CpuPercent is the CPU limit as a percentage of total host CPU (0-100) per invocation.
	// If nil, there is no CPU limit.
	CpuPercent *int `yaml:"cpu-percent" validate:"omitempty,min=1,max=100"`
}

ToolSelection represents the selection and configuration of a tool for a task.

type ToolSelector added in v0.11.0

type ToolSelector struct {
	// Disabled determines whether tools are disabled for the task.
	// Individual tools can override this setting.
	Disabled *bool `yaml:"disabled" validate:"omitempty"`
	// Tools lists the tools to be available in the task execution.
	Tools []ToolSelection `yaml:"tools" validate:"omitempty,unique=Name,dive"`
}

ToolSelector defines settings for using tools in task execution.

func (ToolSelector) GetEnabledToolsByName added in v0.11.0

func (ts ToolSelector) GetEnabledToolsByName() (map[string]ToolSelection, bool)

GetEnabledToolsByName returns the map of tools that are not disabled and a boolean indicating if any tools are enabled. For each tool, if ToolSelection.Disabled is nil, uses the ToolSelector.Disabled value.

func (ToolSelector) MergeWith added in v0.11.0

func (these ToolSelector) MergeWith(other *ToolSelector) ToolSelector

MergeWith merges this tool selector with another and returns the result. The provided other values override these values if set.

type URI

type URI struct {
	// contains filtered or unexported fields
}

URI represents a parsed URI/URL that can be used to reference a file.

func (URI) IsLocalFile

func (u URI) IsLocalFile() bool

IsLocalFile checks if the URI references a local file.

func (URI) IsRemoteFile

func (u URI) IsRemoteFile() bool

IsRemoteFile checks if the URI references a remote file.

func (URI) MarshalYAML

func (u URI) MarshalYAML() (interface{}, error)

MarshalYAML implements custom YAML marshaling for URI.

func (*URI) Parse

func (u *URI) Parse(raw string) (err error)

Parse parses a raw URI string into a structured URI object. It validates that the URI scheme is supported.

func (URI) Path

func (u URI) Path(basePath string) string

Path returns the filesystem path for local URIs. For relative local paths, it uses the provided basePath to create an absolute path.

func (URI) String

func (u URI) String() string

String returns the original raw URI string.

func (URI) URL

func (u URI) URL() *url.URL

URL returns the parsed URL.

func (*URI) UnmarshalYAML

func (u *URI) UnmarshalYAML(value *yaml.Node) error

UnmarshalYAML implements custom YAML unmarshaling for URI.

type ValidationRules added in v0.3.0

type ValidationRules struct {
	// CaseSensitive determines whether string comparison should be case-sensitive.
	CaseSensitive *bool `yaml:"case-sensitive" validate:"omitempty"`

	// IgnoreWhitespace determines whether all whitespace should be ignored during comparison.
	// When true, all whitespace characters (spaces, tabs, newlines) are removed before comparison.
	IgnoreWhitespace *bool `yaml:"ignore-whitespace" validate:"omitempty"`

	// TrimLines determines whether to trim leading and trailing whitespace of each line
	// before comparison.
	TrimLines *bool `yaml:"trim-lines" validate:"omitempty"`

	// Judge specifies the judge configuration to use for evaluation.
	// When enabled, an LLM will be used to evaluate the correctness of the response
	// instead of simple string matching.
	Judge JudgeSelector `yaml:"judge" validate:"omitempty"`
}

ValidationRules represents task validation rules. It controls how model responses should be validated against expected results.

func (ValidationRules) IsCaseSensitive added in v0.3.0

func (vr ValidationRules) IsCaseSensitive() bool

IsCaseSensitive returns whether validation should be case sensitive.

func (ValidationRules) IsIgnoreWhitespace added in v0.3.0

func (vr ValidationRules) IsIgnoreWhitespace() bool

IsIgnoreWhitespace returns whether whitespace should be ignored during validation.

func (ValidationRules) IsTrimLines added in v0.6.1

func (vr ValidationRules) IsTrimLines() bool

IsTrimLines returns whether each line should be trimmed before validation.

func (ValidationRules) MergeWith added in v0.3.0

func (these ValidationRules) MergeWith(other *ValidationRules) ValidationRules

MergeWith merges these validation rules with other rules and returns the result. The provided other values override these values if set.

func (ValidationRules) UseJudge added in v0.5.0

func (vr ValidationRules) UseJudge() bool

UseJudge returns whether judge evaluation is enabled.

type XAIClientConfig added in v0.7.2

type XAIClientConfig struct {
	// APIKey is the API key for the xAI provider.
	APIKey string `yaml:"api-key" validate:"required"`
}

XAIClientConfig represents xAI provider settings.

type XAIModelParams added in v0.7.2

type XAIModelParams struct {
	// Temperature controls the randomness or "creativity" of the model's outputs.
	// Notes: Higher values (e.g. 0.8) make outputs more random; lower values
	// (e.g. 0.2) make outputs more focused and deterministic.
	// Valid range: 0.0 — 2.0. Default: 1.0.
	Temperature *float32 `yaml:"temperature" validate:"omitempty,min=0,max=2"`

	// TopP controls diversity via nucleus sampling (probability mass cutoff).
	// Notes: Use either Temperature or TopP, not both, for sampling control.
	// Valid range: 0.0 — 1.0. Default: 1.0.
	TopP *float32 `yaml:"top-p" validate:"omitempty,min=0,max=1"`

	// MaxCompletionTokens controls the maximum number of tokens to generate in the completion.
	MaxCompletionTokens *int32 `yaml:"max-completion-tokens" validate:"omitempty,min=0"`

	// PresencePenalty penalizes new tokens based on whether they appear in the text so far.
	// Notes: Positive values encourage the model to introduce new topics.
	// Valid range: -2.0 — 2.0. Default: 0.0.
	PresencePenalty *float32 `yaml:"presence-penalty" validate:"omitempty,min=-2,max=2"`

	// FrequencyPenalty penalizes new tokens based on their frequency in the text so far.
	// Notes: Positive values discourage repetition.
	// Valid range: -2.0 — 2.0. Default: 0.0.
	FrequencyPenalty *float32 `yaml:"frequency-penalty" validate:"omitempty,min=-2,max=2"`

	// ReasoningEffort constrains how much "reasoning" budget to spend for reasoning-capable models.
	// Notes: Not all reasoning models support this option.
	// Valid values: "low", "high".
	ReasoningEffort *string `yaml:"reasoning-effort" validate:"omitempty,oneof=low high"`

	// Seed requests deterministic sampling when possible.
	// No guaranteed determinism — xAI makes a best-effort to return
	// repeatable outputs for identical inputs when `seed` and other parameters are the same.
	Seed *int32 `yaml:"seed" validate:"omitempty"`
}

XAIModelParams represents xAI model-specific settings.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL