iosystem

package
v0.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 18, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsEOF added in v0.2.0

func IsEOF(docs []*Document) bool

Types

type Document

type Document struct {
	// Key is the simple path-based identity (e.g., "sub/a.txt")
	Key Key

	// Type specifies the content type of the document
	// (e.g., "application/octet-stream", "application/json")
	Type string

	// Reader provides streaming access to document content
	Reader io.Reader

	// Metadata contains additional information about the document
	Metadata Metadata
}

Document represents a single input document with metadata. Documents flow through the pipeline from Source → Processor → Sink.

func EOF added in v0.2.0

func EOF() *Document

func NewDocument

func NewDocument(key Key, contentType string, reader io.Reader) *Document

NewDocument creates a new document with the given key and reader. The content type defaults to application/octet-stream.

func (*Document) EnsureExtension added in v0.2.0

func (d *Document) EnsureExtension()

EnsureExtension updates the document key to have the correct file extension based on the document's content type. If the extension already matches, the key is unchanged.

func (*Document) WithMetadata

func (d *Document) WithMetadata(key, value string) *Document

WithMetadata adds metadata to the document and returns it for chaining.

type Key added in v0.1.14

type Key string

Key is a simple filesystem-style path used as document identity. Keys are relative paths using forward slashes as separators.

Examples:

"a.txt"           - file in root
"sub/a.txt"       - file in subdirectory
"summary/a.txt"   - file with emit prefix

func (Key) SeqID added in v0.2.0

func (k Key) SeqID(id int) Key

type Metadata added in v0.1.14

type Metadata struct {
	ContentType string            // MIME type (e.g., "text/plain", "application/json")
	Extension   string            // File extension for output (e.g., ".txt", ".md")
	Size        int64             // Document size in bytes
	Custom      map[string]string // User-defined attributes
}

Metadata holds document attributes (NOT part of key identity).

type Processor

type Processor interface {
	// Process takes input documents and produces zero or more output documents.
	//   - Single doc processing: len(docs)==1 (normal case)
	//   - Array processing: len(docs)>1 (array mode after collection)
	//   - EOF signal: docs[0].Type == ContentEOF (end of stream)
	// Return empty slice to filter out documents.
	// Return error for processing failures.
	Process(ctx context.Context, docs []*Document) ([]*Document, error)

	// Close releases any resources held by the processor.
	Close() error
}

Processor transforms documents in a pipeline. Processors are monadic: accept array of documents, return array of documents.

A processor can:

  • Transform documents (map)
  • Split documents (flatMap - one doc becomes many)
  • Filter documents (return empty slice)
  • Collect/aggregate documents (ArrayCollector pattern)

Implementations should:

  • Be stateless where possible
  • Return errors for processing failures
  • Release resources in Close()

type Sink

type Sink interface {
	// Write stores a document to the sink's destination.
	// Returns error if the write fails.
	Write(ctx context.Context, doc *Document) error

	// Close finalizes output and releases resources.
	// Should flush any buffered writes.
	Close() error
}

Sink consumes processed documents. Sinks are the exit point of a processing pipeline.

Implementations should:

  • Write documents to their destination
  • Handle errors gracefully
  • Buffer writes if beneficial for performance
  • Flush buffers and release resources in Close()

type Source

type Source interface {
	// Next returns the next document or io.EOF when complete.
	// Other errors indicate retrieval failures.
	Next(ctx context.Context) (*Document, error)

	// Close releases any resources held by the source.
	Close() error
}

Source produces documents from an input source. Sources are the entry point of a processing pipeline.

Implementations should:

  • Return documents one at a time via Next()
  • Return io.EOF when no more documents are available
  • Be safe for single-goroutine use (not necessarily concurrent)
  • Release resources in Close()

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL