index

package
v4.21.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2026 License: GPL-3.0 Imports: 40 Imported by: 0

Documentation

Overview

Package index provides interfaces for indexing documents metadata and retrieving this metadata back from the index. Currently, there is only one implementation to those interfaces, using Bleve.

Index

Constants

View Source
const AuthorVersion = "1"

AuthorVersion identifies the mapping used for indexing authors. Any changes in the mapping requires an increase of version, to signal that a new index needs to be created.

View Source
const DocumentVersion = "v11"

DocumentVersion identifies the mapping used for indexing documents. Any changes in the mapping requires an increase of version, to signal that a new index needs to be created.

Variables

View Source
var ErrDocumentNotFound = errors.New("document not found")

ErrDocumentNotFound is returned when a document cannot be found by slug.

Functions

func CreateAuthorsIndex added in v4.16.0

func CreateAuthorsIndex(path string) bleve.Index

func CreateAuthorsMapping added in v4.16.0

func CreateAuthorsMapping() mapping.IndexMapping

func CreateDocumentsIndex added in v4.16.0

func CreateDocumentsIndex(path string) bleve.Index

func CreateDocumentsMapping added in v4.16.0

func CreateDocumentsMapping() mapping.IndexMapping

func MigrateAuthors added in v4.16.0

func MigrateAuthors(oldIndex, newIndex bleve.Index, batchSize int) error

MigrateAuthors migrates all authors from a legacy index to a new index. It searches only for items with Type = "author" and deletes them immediately after migration to avoid pagination issues and free up disk space.

func MigrateDocuments added in v4.16.0

func MigrateDocuments(oldIndex, newIndex bleve.Index, batchSize int) error

MigrateDocuments migrates all documents from a legacy index to a new documents index in batches. It always loads the first 1000 documents to avoid pagination issues, and deletes them immediately after migration to avoid using much disk space.

func NeedsReindexForIllustratedConfig added in v4.20.0

func NeedsReindexForIllustratedConfig(documentsIndex bleve.Index, currentMinSize float64) (bool, error)

NeedsReindexForIllustratedConfig reports whether the documents index must be rebuilt because the stored illustrated-min-size config differs from currentMinSize (or is missing).

Types

type Author added in v4.3.0

type Author struct {
	Slug            string
	Name            string
	BirthName       string
	DataSourceID    string
	RetrievedOn     time.Time
	WikipediaLink   map[string]string
	InstanceOf      float64
	Description     map[string]string
	DateOfBirth     precisiondate.PrecisionDate
	DateOfDeath     precisiondate.PrecisionDate
	Website         string
	DataSourceImage string
	Gender          float64
	Pseudonyms      []string
}

func (Author) Age added in v4.7.0

func (a Author) Age() int

func (Author) BirthNameIncludesName added in v4.7.0

func (a Author) BirthNameIncludesName() bool

func (Author) YearOfBirthAbs added in v4.7.0

func (a Author) YearOfBirthAbs() int

func (Author) YearOfDeathAbs added in v4.7.0

func (a Author) YearOfDeathAbs() int

type BleveIndexer

type BleveIndexer struct {
	// contains filtered or unexported fields
}

func NewBleve

func NewBleve(documentsIndex bleve.Index, authorsIndex bleve.Index, fs afero.Fs, libraryPath string, read map[string]metadata.Reader, cfg Config) *BleveIndexer

NewBleve creates a new BleveIndexer instance using the passed parameters

func (*BleveIndexer) AddLibrary

func (b *BleveIndexer) AddLibrary(batchSize int, forceIndexing bool) error

AddLibrary scans <libraryPath> for documents and adds them to the index in batches of <batchSize> if they haven't been previously indexed or if <forceIndexing> is true

func (*BleveIndexer) Author added in v4.3.0

func (b *BleveIndexer) Author(slug, lang string) (Author, error)

func (*BleveIndexer) Close

func (b *BleveIndexer) Close() error

Close closes both indexes

func (*BleveIndexer) Count

func (b *BleveIndexer) Count() (uint64, error)

Count returns the number of indexed documents

func (*BleveIndexer) Cover added in v4.21.0

func (b *BleveIndexer) Cover(slug string, coverMaxWidth int) ([]byte, error)

Cover returns the cover image for the document identified by slug, resized to at most coverMaxWidth pixels wide.

func (*BleveIndexer) DeleteDocument added in v4.21.0

func (b *BleveIndexer) DeleteDocument(slug string) error

DeleteDocument removes the document identified by slug from the index and deletes its file from the filesystem.

func (*BleveIndexer) Document

func (b *BleveIndexer) Document(slug string) (Document, error)

func (*BleveIndexer) DocumentByID added in v4.9.0

func (b *BleveIndexer) DocumentByID(ID string) (Document, error)

@deprecated Remove after migration

func (*BleveIndexer) Documents

func (b *BleveIndexer) Documents(slugs []string) (map[string]Document, error)

Documents returns documents for the given slugs in a single search. Missing or invalid slugs are omitted.

func (*BleveIndexer) File added in v4.21.0

func (b *BleveIndexer) File(slug string) (*IndexedFile, error)

File returns the raw document payload and metadata for the given slug.

func (*BleveIndexer) IndexAuthor added in v4.7.0

func (b *BleveIndexer) IndexAuthor(author Author) error

func (*BleveIndexer) IndexingProgress

func (b *BleveIndexer) IndexingProgress() (Progress, error)

func (*BleveIndexer) Languages added in v4.16.0

func (b *BleveIndexer) Languages() ([]string, error)

Languages returns a list of all unique languages in the indexed documents using faceted search.

func (*BleveIndexer) LatestDocs added in v4.7.0

func (b *BleveIndexer) LatestDocs(limit int) ([]Document, error)

func (*BleveIndexer) NewFile added in v4.21.0

func (b *BleveIndexer) NewFile(fileName string, contents []byte) (string, error)

NewFile writes the given contents to the library as fileName, indexes it, and returns the document slug.

func (*BleveIndexer) SameAuthors

func (b *BleveIndexer) SameAuthors(slugID string, quantity int) ([]Document, error)

SameAuthors returns an array of metadata of documents by the same authors which does not belong to the same collection

func (*BleveIndexer) SameSeries

func (b *BleveIndexer) SameSeries(slugID string, quantity int) ([]Document, error)

SameSeries returns an array of metadata of documents in the same series

func (*BleveIndexer) SameSubjects

func (b *BleveIndexer) SameSubjects(slugID string, quantity int) ([]Document, error)

SameSubjects returns an array of metadata of documents by other authors, which have similar subjects as the passed one and does not belong to the same collection They are sorted by subjects matching and date, the closest to the publishing date of the reference document first

func (*BleveIndexer) Search

func (b *BleveIndexer) Search(searchFields SearchFields, page, resultsPerPage int) (result.Paginated[[]Document], error)

Search look for documents which match the passed keywords and filters. Returns a maximum <resultsPerPage> documents, offset by <page>

func (*BleveIndexer) SearchByAuthor added in v4.3.0

func (b *BleveIndexer) SearchByAuthor(searchFields SearchFields, page, resultsPerPage int) (result.Paginated[[]Document], error)

func (*BleveIndexer) SearchBySeries added in v4.9.0

func (b *BleveIndexer) SearchBySeries(searchFields SearchFields, page, resultsPerPage int) (result.Paginated[[]Document], error)

func (*BleveIndexer) Slug

func (b *BleveIndexer) Slug(document Document, batchSlugs map[string]struct{}) string

As Bleve index is not updated until the batch is executed, we need to store the slugs processed in the current batch in memory to also compare the current doc slug against them.

func (*BleveIndexer) StartFileWatcher added in v4.21.0

func (b *BleveIndexer) StartFileWatcher()

StartFileWatcher starts watching the library path for file changes and updates the index. It blocks until the process exits. Call it in a goroutine.

func (*BleveIndexer) Subjects added in v4.18.0

func (b *BleveIndexer) Subjects() (map[string][]string, error)

Subjects returns subject groups: each slug with all display names that map to it. Uses Subjects field for faceting; names are normalized (first letter capitalized). Grouping uses slug.Make so variants like "cronica" and "crónica" share one slug (slug transliterates accents).

func (*BleveIndexer) TotalWordCount added in v4.15.0

func (b *BleveIndexer) TotalWordCount(slugs []string) (float64, error)

TotalWordCount returns the sum of word counts for the documents matching the given slugs.

type Config added in v4.20.0

type Config struct {
	// IllustratedMinAmount is the minimum number of illustrations (excluding cover) for a document to be considered illustrated.
	IllustratedMinAmount int
	// IllustratedMinSize is the minimum size in megapixels for an image to count as an illustration.
	IllustratedMinSize float64
}

Config holds indexer configuration.

type Document

type Document struct {
	metadata.Metadata
	ID            string
	Slug          string
	AuthorsSlugs  []string
	SeriesSlug    string
	SubjectsSlugs []string
	AddedOn       time.Time
}

func (Document) BleveType added in v4.3.0

func (d Document) BleveType() string

BleveType is part of the bleve.Classifier interface and its purpose is to tell the indexer the type of the document, which will be used to decide which analyzer will parse it.

type IndexedFile added in v4.21.0

type IndexedFile struct {
	Document    Document
	Data        []byte
	FileName    string
	ContentType string
}

IndexedFile holds the bytes and metadata for a document download.

type Progress

type Progress struct {
	RemainingTime time.Duration
	Percentage    float64
}

type SearchFields added in v4.9.0

type SearchFields struct {
	Keywords        string
	Language        string
	Subjects        string
	PubDateFrom     date.Date
	PubDateTo       date.Date
	EstReadTimeFrom float64
	EstReadTimeTo   float64
	WordsPerMinute  float64
	IllustratedOnly bool
	SortBy          []string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL