Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrContentProcessingFailed = errors.New("content processing failed")
ErrContentProcessingFailed indicates that content processing failed.
Functions ¶
func InitializeProcessors ¶
func InitializeProcessors(cfg ProcessorConfig) ([]Processor, []Options, error)
InitializeProcessors initializes a list of processors based on the provided names and configurations. It returns a slice of Processor instances and their corresponding options. The options at index i correspond to the processor at index i in the processors slice.
Types ¶
type Options ¶
type Options struct {
// Minimum content length to consider valid (in characters)
MinContentLength int
// Whether to include images in the processed content
IncludeImages bool
// Whether to include tables in the processed content
IncludeTables bool
// Whether to include videos in the processed content
IncludeVideos bool
// Maximum length for article content (0 means no limit)
MaxContentLength int
// Additional processor-specific options
AdditionalOptions map[string]any
}
Options contains configuration for content processors.
func DefaultOptions ¶
func DefaultOptions() Options
DefaultOptions returns the default processor options.
type Processor ¶
type Processor interface {
// Process processes the raw content and updates the article with processed content
Process(article *models.Article, opts *Options) error
// Name returns the name of this processor
Name() string
}
Processor defines the interface for content processors.
type ProcessorConfig ¶
type ProcessorConfig struct {
// Processors is a list of processors to apply in the order defined
Processors []string `toml:"processors"`
// ProcessorConfigs contains optional configuration for each processor. If
// a processor is not configured, it will use its default settings.
ProcessorConfigs map[string]any `toml:"processor_configs"`
}
type ReadabilityProcessor ¶
type ReadabilityProcessor struct{}
ReadabilityProcessor uses go-readability to extract the main content from HTML.
func NewReadabilityProcessor ¶
func NewReadabilityProcessor() *ReadabilityProcessor
NewReadabilityProcessor creates a new readability-based content processor.
func (*ReadabilityProcessor) Name ¶
func (p *ReadabilityProcessor) Name() string
Name returns the name of this processor.
type SanitizerProcessor ¶ added in v0.2.0
type SanitizerProcessor struct{}
SanitizerProcessor is a processor that sanitizes HTML content.
func NewSanitizerProcessor ¶ added in v0.2.0
func NewSanitizerProcessor() *SanitizerProcessor
NewSanitizerProcessor creates a new instance of SanitizerProcessor.
func (*SanitizerProcessor) Name ¶ added in v0.2.0
func (s *SanitizerProcessor) Name() string