Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewIndexer ¶
NewIndexer creates a new parent indexer that handles document splitting and sub-document management.
Parameters:
- ctx: context for the operation
- config: configuration for the parent indexer
Example usage:
indexer, err := NewIndexer(ctx, &Config{
Indexer: milvusIndexer,
Transformer: textSplitter,
ParentIDKey: "source_doc_id",
SubIDGenerator: func(ctx context.Context, parentID string, num int) ([]string, error) {
ids := make([]string, num)
for i := 0; i < num; i++ {
ids[i] = fmt.Sprintf("%s_chunk_%d", parentID, i+1)
}
return ids, nil
},
})
Returns:
- indexer.Indexer: the created parent indexer
- error: any error encountered during creation
Types ¶
type Config ¶
type Config struct {
// Indexer is the underlying indexer implementation that handles the actual document indexing.
// For example: a vector database indexer like Milvus, or a full-text search indexer like Elasticsearch.
Indexer indexer.Indexer
// Transformer processes documents before indexing, typically splitting them into smaller chunks.
// Each sub-document generated by the transformer must retain its parent document's ID.
// For example: if a document with ID "doc_1" is split into 3 chunks, all chunks will initially
// have ID "doc_1". These IDs will later be modified by the SubIDGenerator.
//
// Example transformations:
// - A text splitter that breaks down large documents into paragraphs
// - A code splitter that separates code files into functions
Transformer document.Transformer
// ParentIDKey specifies the metadata key used to store the original document's ID in each sub-document.
// For example: if ParentIDKey is "parent_id", each sub-document will have metadata like:
// {"parent_id": "original_doc_123"}
ParentIDKey string
// SubIDGenerator generates unique IDs for sub-documents based on their parent document ID.
// For example: if parent ID is "doc_1" and we need 3 sub-document IDs, it might generate:
// ["doc_1_chunk_1", "doc_1_chunk_2", "doc_1_chunk_3"]
//
// Parameters:
// - ctx: context for the operation
// - parentID: the ID of the parent document
// - num: number of sub-document IDs needed
// Returns:
// - []string: slice of generated sub-document IDs
// - error: any error encountered during ID generation
SubIDGenerator func(ctx context.Context, parentID string, num int) ([]string, error)
}
Click to show internal directories.
Click to hide internal directories.