evaluation

package

v1.1.0 Latest Latest Go to latest Published: Mar 20, 2026 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/DotNetAge/gorag

Links

Open Source Insights

Documentation ¶

Index ¶

Constants
func NewCRAGEvaluator(llm chat.Client) core.CRAGEvaluator
func NewRAGEvaluator(llm chat.Client) core.RAGEvaluator
type BenchmarkResult
- func RunBenchmark(ctx context.Context, retriever core.Retriever, judge LLMJudge, ...) (*BenchmarkResult, error)
- func (r *BenchmarkResult) Summary() string
type CaseResult
type LLMJudge
type Label
type RagasLLMJudge
- func NewRagasLLMJudge(judgeLLM chat.Client) *RagasLLMJudge
type TestCase

Constants ¶

View Source

const (
	Relevant   = core.CRAGRelevant
	Irrelevant = core.CRAGIrrelevant
	Ambiguous  = core.CRAGAmbiguous
)

Variables ¶

This section is empty.

Functions ¶

func NewCRAGEvaluator ¶

func NewCRAGEvaluator(llm chat.Client) core.CRAGEvaluator

func NewRAGEvaluator ¶

func NewRAGEvaluator(llm chat.Client) core.RAGEvaluator

Types ¶

type BenchmarkResult ¶

type BenchmarkResult struct {
	TotalCases      int           `json:"total_cases"`
	AvgFaithfulness float32       `json:"avg_faithfulness"`
	AvgRelevance    float32       `json:"avg_relevance"`
	AvgPrecision    float32       `json:"avg_precision"`
	TotalDuration   time.Duration `json:"total_duration"`
	Results         []CaseResult  `json:"results"`
}

BenchmarkResult holds the overall results of a benchmark run.

func RunBenchmark ¶

func RunBenchmark(ctx context.Context, retriever core.Retriever, judge LLMJudge, cases []TestCase, topK int) (*BenchmarkResult, error)

RunBenchmark executes a full evaluation suite against a retriever.

func (*BenchmarkResult) Summary ¶

func (r *BenchmarkResult) Summary() string

Summary returns a human-readable summary of the benchmark.

type CaseResult ¶

type CaseResult struct {
	Query             string        `json:"query"`
	Answer            string        `json:"answer"`
	FaithfulnessScore float32       `json:"faithfulness"`
	RelevanceScore    float32       `json:"relevance"`
	PrecisionScore    float32       `json:"precision"`
	Duration          time.Duration `json:"duration"`
}

CaseResult holds the evaluation result for a single test case.

type LLMJudge ¶

type LLMJudge interface {
	// EvaluateFaithfulness checks if the generated answer is strictly grounded in the retrieved chunks.
	EvaluateFaithfulness(ctx context.Context, query string, chunks []*core.Chunk, answer string) (score float32, reason string, err error)

	// EvaluateAnswerRelevance checks if the answer effectively addresses the user's intent.
	EvaluateAnswerRelevance(ctx context.Context, query string, answer string) (score float32, reason string, err error)

	// EvaluateContextPrecision checks if the retrieved context actually contains the useful information.
	EvaluateContextPrecision(ctx context.Context, query string, chunks []*core.Chunk) (score float32, reason string, err error)
}

LLMJudge provides production-grade Evaluation metrics (e.g., RAGAS) using an LLM as the evaluator.

type Label ¶

type Label = core.CRAGLabel

Re-export common types if needed, but prefer direct core usage

type RagasLLMJudge ¶

type RagasLLMJudge struct {
	// contains filtered or unexported fields
}

RagasLLMJudge implements the LLMJudge interface using standard RAGAS-style prompts. It leverages a strong LLM (like GPT-4) to grade the pipeline's output.

func NewRagasLLMJudge ¶

func NewRagasLLMJudge(judgeLLM chat.Client) *RagasLLMJudge

func (*RagasLLMJudge) EvaluateAnswerRelevance ¶

func (j *RagasLLMJudge) EvaluateAnswerRelevance(ctx context.Context, query string, answer string) (float32, string, error)

EvaluateAnswerRelevance checks if the answer actually answers the user's question.

func (*RagasLLMJudge) EvaluateContextPrecision ¶

func (j *RagasLLMJudge) EvaluateContextPrecision(ctx context.Context, query string, chunks []*core.Chunk) (float32, string, error)

EvaluateContextPrecision checks the quality of the retrieved chunks.

func (*RagasLLMJudge) EvaluateFaithfulness ¶

func (j *RagasLLMJudge) EvaluateFaithfulness(ctx context.Context, query string, chunks []*core.Chunk, answer string) (float32, string, error)

EvaluateFaithfulness checks for hallucinations against the retrieved context.

type TestCase ¶

type TestCase struct {
	Query       string `json:"query"`
	GroundTruth string `json:"ground_truth,omitempty"` // Optional: What we expect the answer to be
}

TestCase represents a single evaluation entry.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL