Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ContentExtractor ¶
ContentExtractor is a simple function signature that takes an io.Reader and returns the full text. This is used for legacy parsers or file types that cannot be easily streamed (like Docx/PDF).
type GenericStreamWrapper ¶
type GenericStreamWrapper struct {
// contains filtered or unexported fields
}
GenericStreamWrapper wraps a full-read extractor into a dataprep.Parser (streaming interface). It's a bridge for legacy parsers.
func NewGenericStreamWrapper ¶
func NewGenericStreamWrapper(name string, types []string, extractor ContentExtractor) *GenericStreamWrapper
func (*GenericStreamWrapper) GetSupportedTypes ¶
func (w *GenericStreamWrapper) GetSupportedTypes() []string
Click to show internal directories.
Click to hide internal directories.