eval

command
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2026 License: Apache-2.0 Imports: 17 Imported by: 0

Documentation

Overview

Command eval runs evaluation suites against a GoReason engine.

ALTAVision usage:

go run -tags sqlite_fts5 ./cmd/eval \
  --pdf ./docs/ALTAVision.pdf \
  --chat-provider groq \
  --chat-model openai/gpt-oss-120b \
  --difficulty easy

LegalBench-RAG usage:

go run -tags sqlite_fts5 ./cmd/eval \
  --dataset-type legalbench \
  --corpus-dir ./data/legalbench-rag-mini/corpus \
  --benchmark-file ./data/legalbench-rag-mini/benchmarks/cuad.json \
  --benchmark-file ./data/legalbench-rag-mini/benchmarks/contractnli.json \
  --chat-provider groq \
  --chat-model openai/gpt-oss-120b

GDPR usage (Graph RAG):

go run -tags sqlite_fts5 ./cmd/eval \
  --dataset-type gdpr \
  --pdf ~/Downloads/CELEX_32016R0679_EN_TXT.pdf \
  --chat-provider ollama --chat-model llama3.1:8b \
  --embed-provider openai --embed-model text-embedding-3-small \
  --difficulty all

GDPR full-context baseline (Gemini):

go run -tags sqlite_fts5 ./cmd/eval \
  --dataset-type gdpr \
  --pdf ~/Downloads/CELEX_32016R0679_EN_TXT.pdf \
  --full-context \
  --fc-provider gemini --fc-model gemini-2.0-flash \
  --difficulty all

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL