indexing

package
v1.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 6, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package indexing provides the core indexing pipeline for offline data preparation.

This package defines the Indexer interface which serves as the entry point for processing documents through parsing, chunking, embedding, and storage stages.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type FileWatcher

type FileWatcher struct {
	// contains filtered or unexported fields
}

FileWatcher 文件监控器

func NewFileWatcher

func NewFileWatcher(indexer Indexer, logger logging.Logger) (*FileWatcher, error)

NewFileWatcher 创建文件监控器

func (*FileWatcher) AddConfigs

func (fw *FileWatcher) AddConfigs(configs ...WatchConfig)

AddConfigs 添加多个监控配置

func (*FileWatcher) Start

func (fw *FileWatcher) Start() error

Start 启动文件监控(阻塞式)

func (*FileWatcher) Stop

func (fw *FileWatcher) Stop() error

Stop 停止文件监控

type Indexer

type Indexer interface {
	// IndexFile processes a single file into the Vector/Graph stores.
	IndexFile(ctx context.Context, filePath string) (*core.IndexingContext, error)

	// IndexDirectory concurrently processes an entire directory.
	IndexDirectory(ctx context.Context, dirPath string, recursive bool) error

	// IndexText indexes plain text content directly (no file parsing required).
	// This is useful for programmatic document management from APIs, databases, etc.
	IndexText(ctx context.Context, text string, metadata ...map[string]any) error

	// IndexTexts indexes multiple plain text contents in batch.
	IndexTexts(ctx context.Context, texts []string, metadata ...map[string]any) error

	// IndexDocuments indexes documents directly into Vector/Doc/Graph stores.
	IndexDocuments(ctx context.Context, docs ...*core.Document) error

	// DeleteDocument removes a document and all its associated chunks and vectors.
	DeleteDocument(ctx context.Context, docID string) error

	// GetDocument retrieves a document by its ID.
	GetDocument(ctx context.Context, docID string) (*core.Document, error)
}

Indexer defines the entry point for the offline data preparation pipeline. It provides methods to process individual files or entire directories into the RAG knowledge base.

type WatchConfig

type WatchConfig struct {
	Path             string        // 监控目录
	Recursive        bool          // 是否递归监控子目录
	Patterns         []string      // 文件匹配模式(例如:[]string{"*.pdf", "*.md"})
	Exclude          []string      // 排除的文件模式
	DebounceInterval time.Duration // 防抖间隔,默认 500ms
}

WatchConfig 文件监控配置

Directories

Path Synopsis
Package community provides community detection algorithms for GraphRAG.
Package community provides community detection algorithms for GraphRAG.
parser
csv
log
pdf
ppt
xml

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL