processor

package
v0.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 18, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Index

Constants

View Source
const (
	ChunkerNone      = "none"
	ChunkerSentence  = "sentence"
	ChunkerParagraph = "paragraph"
	ChunkerChunk     = "chunk"
	ChunkerTag       = "tag"
)

Chunking strategies

Variables

This section is empty.

Functions

func NewChunker

func NewChunker(config ChunkConfig) iosystem.Processor

NewChunker creates a processor that splits documents into chunks. Returns multiple documents from a single input, one per chunk.

func NewCollector added in v0.2.0

func NewCollector(merge bool) iosystem.Processor

NewCollector creates processor that collects documents into array.

func NewIdentity

func NewIdentity() iosystem.Processor

NewIdentity creates a processor that passes documents through unchanged.

Types

type Agent

type Agent struct {
	// contains filtered or unexported fields
}

Agent wraps a blueprint to use as a pipeline processor. It processes documents through the agent's Prompt() method and returns the agent's response as a new document.

The processor:

  • Reads document content as input
  • Passes it to agent.Prompt()
  • Returns agent response as new document
  • Preserves document path with optional suffix
  • Supports JSON format output from agents

Example:

agent := getAgentFromBlueprint()
proc := NewAgentProcessor(agent, AgentConfig{
    Suffix: ".processed",
})
docs, err := proc.Process(ctx, inputDoc)

func NewAgent

func NewAgent(w Worker, config *AgentConfig) *Agent

NewAgent creates a processor that wraps a blueprint Agent.

func (*Agent) Close

func (p *Agent) Close() error

Close releases resources. For AgentProcessor, this is a no-op as the agent lifecycle is managed externally.

func (*Agent) Process

func (p *Agent) Process(ctx context.Context, docs []*iosystem.Document) ([]*iosystem.Document, error)

Process transforms a document by passing its content through the agent.

Input document content is read and passed to agent.Prompt() as:

  • string if content is valid UTF-8
  • []byte otherwise
  • map[string]any if document has metadata that should be templated

The agent's response becomes the content of the output document. Output format depends on agent's configuration (text or JSON).

type AgentConfig

type AgentConfig struct {
	Suffix  string        // Suffix to add to output document path (default: empty)
	Options []chatter.Opt // Chatter options to pass to agent.Prompt() (temperature, max_tokens, etc.)
}

AgentConfig configures the agent processor.

type ChunkConfig

type ChunkConfig struct {
	Strategy       string // "none", "sentence", "paragraph", "chunk"
	ChunkSize      int    // Size for chunk strategy (default: 1024)
	DelimiterChars string // Delimiter characters (defaults vary by strategy)
	Buffer         int
}

ChunkConfig configures the chunker processor.

type Chunker

type Chunker struct {
	// contains filtered or unexported fields
}

Chunker splits documents into chunks based on a strategy. Integrates with the existing github.com/fogfish/scanner library.

func (*Chunker) Close

func (p *Chunker) Close() error

Close implements iosystem.Processor.

func (*Chunker) Process

func (p *Chunker) Process(ctx context.Context, docs []*iosystem.Document) ([]*iosystem.Document, error)

Process splits the input document into chunks and returns multiple documents.

type Collector added in v0.2.0

type Collector struct {
	// contains filtered or unexported fields
}

Collector collects all input documents and emits them as array on EOF. This processor enables batch processing mode via --array CLI flag.

Behavior:

  • Normal documents: collected in memory, returns empty slice
  • EOF document: emits all collected documents as array, returns them
  • After EOF: collection resets for potential reuse

Memory consideration:

  • Buffers ALL documents in memory until EOF
  • Not suitable for very large document streams
  • Use only with explicit --array flag

func (*Collector) Close added in v0.2.0

func (p *Collector) Close() error

Close finalizes collection

func (*Collector) Process added in v0.2.0

func (p *Collector) Process(ctx context.Context, docs []*iosystem.Document) ([]*iosystem.Document, error)

Process collects documents or emits array on EOF signal.

Normal documents: collected, return empty slice (stops propagation until EOF) EOF document: emit collected []*Document array (monadic - passes documents directly)

type Identity

type Identity struct{}

Identity is a pass-through processor that returns documents unchanged. Useful for testing and as a base for more complex processors.

func (*Identity) Close

func (p *Identity) Close() error

Close implements iosystem.Processor.

func (*Identity) Process

func (p *Identity) Process(ctx context.Context, docs []*iosystem.Document) ([]*iosystem.Document, error)

Process returns the document unchanged in a slice.

type Prompter

type Prompter struct {
	// contains filtered or unexported fields
}

Wrap LLM as processor

func NewPrompter

func NewPrompter(llm chatter.Chatter) *Prompter

NewPrompter creates a processor that wraps a blueprint Prompter.

func (*Prompter) Close

func (p *Prompter) Close() error

Close releases resources. For AgentProcessor, this is a no-op as the agent lifecycle is managed externally.

func (*Prompter) Process

func (p *Prompter) Process(ctx context.Context, docs []*iosystem.Document) ([]*iosystem.Document, error)

Process transforms a document by passing its content through the agent.

Input document content is read and passed to agent.Prompt() as:

  • string if content is valid UTF-8
  • []byte otherwise
  • map[string]any if document has metadata that should be templated

The agent's response becomes the content of the output document. Output format depends on agent's configuration (text or JSON).

type Worker

type Worker interface {
	Prompt(ctx context.Context, in runtime.Event, opt ...chatter.Opt) (runtime.Event, error)
}

Worker abstraction

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL