Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CleanString ¶
Types ¶
type Basic ¶
type Basic struct {
// contains filtered or unexported fields
}
Basic is the simplest Extractor implementation.
func NewBasicExtractor ¶
NewBasicExtractor creates a new Basic instance.
func (Basic) Extract ¶
func (b Basic) Extract(_ context.Context, ri *storageProvider.ResourceInfo) (Document, error)
Extract literally just rearranges the inputs and processes them into a Document.
type Document ¶
type Document struct {
Title string
Name string
Content string
Size uint64
Mtime string
MimeType string
Tags []string
Audio *libregraph.Audio `json:"audio,omitempty"`
Image *libregraph.Image `json:"image,omitempty"`
Location *libregraph.GeoCoordinates `json:"location,omitempty"`
Photo *libregraph.Photo `json:"photo,omitempty"`
}
Document wraps all resource meta fields, it is used as a content extraction result.
type Extractor ¶
type Extractor interface {
Extract(ctx context.Context, ri *provider.ResourceInfo) (Document, error)
}
Extractor is responsible to extract content and meta information from documents.
type Retriever ¶
type Retriever interface {
Retrieve(ctx context.Context, rID *provider.ResourceId) (io.ReadCloser, error)
}
Retriever is the interface that wraps the basic Retrieve method. 🐕 It requests and then returns a resource from the underlying storage.
type Tika ¶
type Tika struct {
*Basic
Retriever
ContentExtractionSizeLimit uint64
CleanStopWords bool
// contains filtered or unexported fields
}
Tika is used to extract content from a resource, it uses apache tika to retrieve all the data.
func NewTikaExtractor ¶
func NewTikaExtractor(gatewaySelector pool.Selectable[gateway.GatewayAPIClient], logger log.Logger, cfg *config.Config) (*Tika, error)
NewTikaExtractor creates a new Tika instance.
Click to show internal directories.
Click to hide internal directories.