indexing

package

v1.1.9 Latest Latest Go to latest Published: Apr 6, 2026 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/DotNetAge/gorag

Links

Open Source Insights

Documentation ¶

Overview ¶

Package indexing provides the core indexing pipeline for offline data preparation.

This package defines the Indexer interface which serves as the entry point for processing documents through parsing, chunking, embedding, and storage stages.

Index ¶

type FileWatcher
- func NewFileWatcher(indexer Indexer, logger logging.Logger) (*FileWatcher, error)
type Indexer
type WatchConfig

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type FileWatcher ¶

type FileWatcher struct {
	// contains filtered or unexported fields
}

FileWatcher 文件监控器

func NewFileWatcher ¶

func NewFileWatcher(indexer Indexer, logger logging.Logger) (*FileWatcher, error)

NewFileWatcher 创建文件监控器

func (*FileWatcher) AddConfigs ¶

func (fw *FileWatcher) AddConfigs(configs ...WatchConfig)

AddConfigs 添加多个监控配置

func (*FileWatcher) Start ¶

func (fw *FileWatcher) Start() error

Start 启动文件监控（阻塞式）

func (*FileWatcher) Stop ¶

func (fw *FileWatcher) Stop() error

Stop 停止文件监控

type Indexer ¶

type Indexer interface {
	// IndexFile processes a single file into the Vector/Graph stores.
	IndexFile(ctx context.Context, filePath string) (*core.IndexingContext, error)

	// IndexDirectory concurrently processes an entire directory.
	IndexDirectory(ctx context.Context, dirPath string, recursive bool) error

	// IndexText indexes plain text content directly (no file parsing required).
	// This is useful for programmatic document management from APIs, databases, etc.
	IndexText(ctx context.Context, text string, metadata ...map[string]any) error

	// IndexTexts indexes multiple plain text contents in batch.
	IndexTexts(ctx context.Context, texts []string, metadata ...map[string]any) error

	// IndexDocuments indexes documents directly into Vector/Doc/Graph stores.
	IndexDocuments(ctx context.Context, docs ...*core.Document) error

	// DeleteDocument removes a document and all its associated chunks and vectors.
	DeleteDocument(ctx context.Context, docID string) error

	// GetDocument retrieves a document by its ID.
	GetDocument(ctx context.Context, docID string) (*core.Document, error)
}

Indexer defines the entry point for the offline data preparation pipeline. It provides methods to process individual files or entire directories into the RAG knowledge base.

type WatchConfig ¶

type WatchConfig struct {
	Path             string        // 监控目录
	Recursive        bool          // 是否递归监控子目录
	Patterns         []string      // 文件匹配模式（例如：[]string{"*.pdf", "*.md"}）
	Exclude          []string      // 排除的文件模式
	DebounceInterval time.Duration // 防抖间隔，默认 500ms
}

WatchConfig 文件监控配置

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
chunker
community Package community provides community detection algorithms for GraphRAG.	Package community provides community detection algorithms for GraphRAG.
parser
base
config
config/types
csv
dbschema
docx
email
excel
gocode
html
image
javacode
jscode
json
log
markdown
pdf
ppt
pycode
text
tscode
xml
yaml

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL