Documentation
¶
Overview ¶
Package vision provides a tool for LLM-based image analysis and text extraction.
Index ¶
- type Inputs
- type Tool
- func (t *Tool) Available() bool
- func (t *Tool) Execute(ctx context.Context, args map[string]any) (string, error)
- func (t *Tool) Name() string
- func (t *Tool) Paths(ctx context.Context, args map[string]any) (read, write []string, err error)
- func (t *Tool) Sandboxable() bool
- func (t *Tool) Schema() tool.Schema
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Inputs ¶
type Inputs struct {
// Path is the file path to the image or PDF.
Path string `json:"path" jsonschema:"required,description=Path to the image or PDF file" validate:"required"`
// Instruction describes what to extract or analyze.
Instruction string `json:"instruction,omitempty" jsonschema:"description=What to extract or analyze (default: describe or extract text)"`
}
Inputs defines the parameters for the Vision tool.
type Tool ¶
type Tool struct {
tool.Base
// Cfg is the full config — needed because this tool creates sub-agents
// via agent.New(), which requires access to agents, providers, modes,
// and the full BuildAgent pipeline.
Cfg config.Config
CfgPaths config.Paths
Rt *config.Runtime
}
Tool implements vision-based image analysis via an LLM.
func (*Tool) Execute ¶
Execute reads an image/PDF, compresses it, sends it to a vision LLM, and returns the response.
func (*Tool) Sandboxable ¶
Sandboxable returns false because the tool makes network calls to an LLM provider.
Click to show internal directories.
Click to hide internal directories.