Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ContentExtractor ¶
ContentExtractor is a simple function signature that takes an io.Reader and returns the full text. This is used for legacy parsers or file types that cannot be easily streamed (like Docx/PDF).
type GenericStreamWrapper ¶
type GenericStreamWrapper struct {
// contains filtered or unexported fields
}
GenericStreamWrapper wraps a full-read extractor into a core.Parser (streaming interface). It's a bridge for legacy parsers.
func NewGenericStreamWrapper ¶
func NewGenericStreamWrapper(name string, types []string, extractor ContentExtractor) *GenericStreamWrapper
func (*GenericStreamWrapper) GetSupportedTypes ¶
func (w *GenericStreamWrapper) GetSupportedTypes() []string
type Parser ¶
type Parser interface {
// ParseStream reads from an io.Reader and streams parsed Document objects.
// This ensures O(1) memory complexity for handling massive files (e.g., 2GB logs).
ParseStream(ctx context.Context, r io.Reader, metadata map[string]any) (<-chan *core.Document, error)
// GetSupportedTypes returns the file extensions or MIME types this parser supports.
GetSupportedTypes() []string
}
Parser defines the streaming document parser for Next-Gen RAG.
type Triple ¶
type Triple struct {
Subject string `json:"subject"`
Predicate string `json:"predicate"`
Object string `json:"object"`
SubjectType string `json:"subject_type"`
ObjectType string `json:"object_type"`
}
Triple represents a relationship between two entities.
type TriplesExtractor ¶
type TriplesExtractor struct {
// contains filtered or unexported fields
}
TriplesExtractor uses an LLM to extract knowledge triples from text.
func NewTriplesExtractor ¶
func NewTriplesExtractor(llm chat.Client) *TriplesExtractor
Click to show internal directories.
Click to hide internal directories.