Documentation
¶
Index ¶
- Constants
- Variables
- func Evaluate(ctx *Context) (assessments report.AssessmentPerModelPerLanguagePerRepository, ...)
- func Repository(logger *log.Logger, resultPath string, model evalmodel.Model, ...) (repositoryAssessment metrics.Assessments, problems []error, err error)
- func ResetTemporaryRepository(logger *log.Logger, path string) (err error)
- func TemporaryRepository(logger *log.Logger, dataPath string) (temporaryRepositoryPath string, cleanup func(), err error)
- type Context
Constants ¶
View Source
const RepositoryPlainName = "plain"
RepositoryPlainName holds the name of the plain repository.
Variables ¶
View Source
var Version = "0.4.0"
Version holds the current version of the evaluation benchmark.
Functions ¶
func Evaluate ¶ added in v0.5.0
func Evaluate(ctx *Context) (assessments report.AssessmentPerModelPerLanguagePerRepository, totalScore uint64)
Evaluate runs an evaluation on the given context and returns its results.
func Repository ¶ added in v0.5.0
func Repository(logger *log.Logger, resultPath string, model evalmodel.Model, language language.Language, testDataPath string, repositoryName string) (repositoryAssessment metrics.Assessments, problems []error, err error)
Repository evaluate a repository with the given model and language.
func ResetTemporaryRepository ¶ added in v0.5.0
ResetTemporaryRepository resets a temporary repository back to its "initial" commit.
Types ¶
type Context ¶ added in v0.5.0
type Context struct {
// Log holds the logger of the context.
Log *log.Logger
// Languages determines which language should be used for the evaluation, or empty if all languages should be used.
Languages []evallanguage.Language
// Models determines which models should be used for the evaluation, or empty if all models should be used.
Models []evalmodel.Model
// ProviderForModel holds the models and their associated provider.
ProviderForModel map[evalmodel.Model]provider.Provider
// QueryAttempts holds the number of query attempts to perform when a model request errors in the process of solving a task.
QueryAttempts uint
// RepositoryPaths determines which relative repository paths should be used for the evaluation, or empty if all repositories should be used.
RepositoryPaths []string
// ResultPath holds the directory path where results should be written to.
ResultPath string
// TestdataPath determines the testdata path where all repositories reside grouped by languages.
TestdataPath string
// Runs holds the number of runs to perform.
Runs uint
// RunsSequential indicates that interleaved runs are disabled and runs are performed sequentially.
RunsSequential bool
// NoDisqualification indicates that models are not to be disqualified if they fail to solve basic language tasks.
NoDisqualification bool
}
Context holds an evaluation context.
Click to show internal directories.
Click to hide internal directories.