indexer

package

v0.2.3 Latest Latest Go to latest Published: Jun 3, 2026 License: MIT Imports: 17 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/imhuso/hce

Links

Open Source Insights

Documentation ¶

Index ¶

func CollectionName(codebaseID string) string
type Indexer
- func NewIndexer(cfg IndexerConfig) *Indexer
type IndexerConfig

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CollectionName ¶

func CollectionName(codebaseID string) string

CollectionName 把 codebase_id 映射成 milvus collection 名（避免特殊字符）。

Types ¶

type Indexer ¶

type Indexer struct {
	// contains filtered or unexported fields
}

Indexer push-mode 索引编排器：服务端只接收文件内容，做切分/去重/向量化/检索。

func NewIndexer ¶

func NewIndexer(cfg IndexerConfig) *Indexer

NewIndexer 创建索引编排器

func (*Indexer) Clear ¶

func (idx *Indexer) Clear(ctx context.Context, codebaseID string) error

Clear 清除整个 codebase 的索引

func (*Indexer) DeleteFiles ¶

func (idx *Indexer) DeleteFiles(ctx context.Context, codebaseID string, relativePaths []string) (int, error)

DeleteFiles 删除指定文件的所有 chunks

func (*Indexer) Flush ¶

func (idx *Indexer) Flush(ctx context.Context, codebaseID string) error

Flush 强制把 codebase 的 growing segment 落盘，使新写入的 chunk 可被搜索。应在 sync 全部完成时调用一次，不要每批都调。

func (*Indexer) IndexFiles ¶

func (idx *Indexer) IndexFiles(ctx context.Context, codebaseID string, files []model.PushFile) (*model.UpsertResult, error)

IndexFiles 处理客户端推送的一批文件：按文件做 chunk 级增量。

算法（每个文件独立处理）：

切分得到 newChunks，算每个 chunk 的 content sha256 → newHashes
查 Milvus 该文件已有的 chunks（id, chunk_hash）→ existing
existing ∩ new（按 hash 匹配）= 复用，跳过 EMB
existing - new = 已不存在的旧 chunk，需要 Delete
new - existing = 真正的新 chunk，批量 EMB → Insert

func (*Indexer) ListIndexes ¶

func (idx *Indexer) ListIndexes(ctx context.Context) ([]model.IndexInfo, error)

ListIndexes 列出所有 HCE 集合（按 collection 名前缀过滤）

func (*Indexer) Search ¶

func (idx *Indexer) Search(ctx context.Context, codebaseID, query string, topK int) ([]model.SearchResult, error)

Search 语义搜索（hybrid：dense vector + keyword 倒排）。

算法：

dense vector 拿 top_k * 3 个候选（多召回让 keyword 命中有空间挤进来）
keyword 倒排找 query 中关键 token 命中的所有 chunk_id
merge： - 同时被 vector 和 keyword 命中的 chunk → score boost（每命中 1 个 token +0.05） - 仅 keyword 命中的 chunk → 单独 query Milvus 拿元数据，base score = 0.5 + 命中数 × 0.03
按 final score 排序，取 top_k

type IndexerConfig ¶

type IndexerConfig struct {
	Splitter      splitter.Splitter
	Embedding     embedding.Embedding
	VectorDB      vectordb.VectorDB
	MinChunkBytes int    // 过滤掉小于此长度的 chunk（0 = 不过滤）
	RegistryPath  string // collection→codebase_id 映射的持久化路径（空 = 仅内存）
}

IndexerConfig 索引器配置

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL