evaluate

package
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2025 License: MIT Imports: 12 Imported by: 0

Documentation

Index

Constants

View Source
const RepositoryPlainName = "plain"

RepositoryPlainName holds the name of the plain repository.

Variables

View Source
var Revision string

Revision holds the Git revision of the current build.

View Source
var Version = "1.1.0"

Version holds the current version of the evaluation benchmark.

Functions

func Evaluate added in v0.5.0

func Evaluate(ctx *Context)

Evaluate runs an evaluation on the given context and returns its results.

Types

type Context added in v0.5.0

type Context struct {
	// Log holds the logger of the context.
	Log *log.Logger

	// Languages determines which language should be used for the evaluation, or empty if all languages should be used.
	Languages []evallanguage.Language

	// Models determines which models should be used for the evaluation, or empty if all models should be used.
	Models []evalmodel.Model
	// ProviderForModel holds the models and their associated provider.
	ProviderForModel map[evalmodel.Model]provider.Provider
	// APIReqestAttempts holds the number of allowed API requests per LLM query.
	APIReqestAttempts uint
	// APIRequestTimeout holds the timeout for API requests in seconds.
	APIRequestTimeout uint

	// RepositoryPaths determines which relative repository paths should be used for the evaluation, or empty if all repositories should be used.
	RepositoryPaths []string
	// ResultPath holds the directory path where results should be written to.
	ResultPath string
	// TestdataPath determines the testdata path where all repositories reside grouped by languages.
	TestdataPath string

	// RunIDStartsAt holds the run ID starting index created when running a evaluation multiple times.
	RunIDStartsAt uint
	// Runs holds the number of runs to perform.
	Runs uint
	// RunsSequential indicates that interleaved runs are disabled and runs are performed sequentially.
	RunsSequential bool
	// NoDisqualification indicates that models are not to be disqualified if they fail to solve basic language tasks.
	NoDisqualification bool
}

Context holds an evaluation context.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL