Documentation
¶
Index ¶
- type IndexOptions
- type Pipeline
- func (p *Pipeline) Finalize(ctx context.Context, verbose bool, force ...bool) error
- func (p *Pipeline) IndexPath(ctx context.Context, path string, opts IndexOptions) error
- func (p *Pipeline) IndexURL(ctx context.Context, rootURL string, opts IndexOptions) error
- func (p *Pipeline) Prune(ctx context.Context) (int, error)
- type ProgressEvent
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type IndexOptions ¶
type IndexOptions struct {
Force bool
Workers int
Verbose bool
Progress chan<- ProgressEvent
// Web crawl options (used by IndexURL)
MaxPages int
MaxDepth int
SkipSitemap bool
}
IndexOptions controls indexing behavior.
type Pipeline ¶
type Pipeline struct {
// contains filtered or unexported fields
}
Pipeline orchestrates the 5-phase GraphRAG pipeline.
func (*Pipeline) Finalize ¶
Finalize runs Phases 3-4: community detection + parallel summaries. If force is true, the graph fingerprint cache is ignored and communities are always regenerated.
func (*Pipeline) IndexURL ¶
IndexURL crawls a documentation website and indexes all discovered pages.
func (*Pipeline) Prune ¶
versionInfo returns the next version number and canonical ID. Prune removes documents whose source file no longer exists on disk. Returns the number of rows deleted. Only "real" file-backed documents (absolute filesystem paths) are considered — web-crawled rows with http(s):// paths are left alone.
type ProgressEvent ¶
type ProgressEvent struct {
Phase string
Message string
File string
ChunksDone int
ChunksTotal int
Done bool
Error error
}
ProgressEvent sent over progress channel.
Phase is one of: "queued", "load", "chunk", "embed", "extract_entities", "extract_relationships", "extract_claims", "structure", "finalize", "done", "error". Done=true marks the terminal event for a job.
ChunksDone / ChunksTotal are populated for the "embed" phase and any chunk-aware phases; other phases leave them at 0. File is the basename of the file currently being processed (empty for job-wide events).