merger

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 24, 2024 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package merger provides a different set of algorithms for solving T-overlap occurrence problem of sorted lists of integers. T-occurrence problem can be described next: - Find the set of string ids that appear at least T times on the inverted lists, where T is a constant.

Index

Constants

View Source
const MaxOverlap = 0xFFFF

MaxOverlap is the largest value of an overlap count for a merge candidate

Variables

View Source
var ErrCollectionTerminated = errors.New("collection terminated")

ErrCollectionTerminated tells to terminate collection of the current workflow This error is going to be processed by Intersector/Merger and not to be propagated up

View Source
var (
	// ErrIteratorIsNotDereferencable tells than iterator has an invalid state
	ErrIteratorIsNotDereferencable = errors.New("iterator is not dereferencable")
)

Functions

This section is empty.

Types

type Collector

type Collector interface {
	// Collect collects the given candidate
	Collect(candidate MergeCandidate) error
}

Collector collects the doc stream satisfied to a search criteria

type ListIntersector

type ListIntersector interface {
	// Intersect performs intersection operation for the given rid and
	// transmits the result to collector
	Intersect(rid Rid, collector Collector) error
}

ListIntersector is the interface that is responsible for intersection operation between array of docs iterators

func Intersector

func Intersector() ListIntersector

Intersector creates a new instance of ListIntersector

type ListIterator

type ListIterator interface {
	// Get returns the current pointed element of the list
	Get() (uint32, error)
	// HasNext tells if the given iterator can be moved to the next record
	HasNext() bool
	// Next moves the given iterator to the next record
	Next() (uint32, error)
	// LowerBound moves the given iterator to the smallest record x
	// in corresponding list such that x >= to
	LowerBound(to uint32) (uint32, error)
	// Len returns the actual size of the list
	Len() int
}

ListIterator is the interface of a posting list iterator

type ListMerger

type ListMerger interface {
	// Merge returns list of candidates, that appears at least `threshold` times.
	Merge(rid Rid, threshold int, collector Collector) error
}

ListMerger solves `threshold`-occurrence problem: For given inverted lists find the set of strings ids, that appears at least `threshold` times.

func CPMerge

func CPMerge() ListMerger

CPMerge was described in paper "Simple and Efficient Algorithm for Approximate Dictionary Matching" inspired by https://github.com/chokkan/simstring

func DivideSkip

func DivideSkip(mu float64) ListMerger

DivideSkip was described in paper "Efficient Merging and Filtering Algorithms for Approximate String Searches" We have to choose `good` parameter mu, for improving speed. So, mu depends only on given dictionary, so we can find it

func MergeSkip

func MergeSkip() ListMerger

MergeSkip was described in paper "Efficient Merging and Filtering Algorithms for Approximate String Searches" Formally, main idea is to skip on the lists those record ids that cannot be in the answer to the query, by utilizing the threshold

func ScanCount

func ScanCount() ListMerger

ScanCount scan the N inverted lists one by one. For each string id on each list, we increment the count corresponding to the string by 1. We report the string ids that appear at least `threshold` times on the lists.

type MergeCandidate

type MergeCandidate uint64

MergeCandidate is result of merging Rid

func NewMergeCandidate

func NewMergeCandidate(position, overlap uint32) MergeCandidate

NewMergeCandidate creates a new instance of MergeCandidate

func (MergeCandidate) Overlap

func (m MergeCandidate) Overlap() int

Overlap returns the current overlap count of the candidate

func (MergeCandidate) Position

func (m MergeCandidate) Position() uint32

Position returns the given position of the candidate

type Rid

type Rid []ListIterator

Rid represents inverted lists for ListMerger

func (Rid) Len

func (p Rid) Len() int

Len is the number of elements in the collection.

func (Rid) Less

func (p Rid) Less(i, j int) bool

Less reports whether the element with index i should sort before the element with index j.

func (Rid) Swap

func (p Rid) Swap(i, j int)

Swap swaps the elements with indexes i and j.

type SimpleCollector

type SimpleCollector struct {
	Candidates []MergeCandidate
}

SimpleCollector a dummy implementation of Collector

func (*SimpleCollector) Collect

func (c *SimpleCollector) Collect(candidate MergeCandidate) error

Collect collects the given candidate

type SliceIterator

type SliceIterator struct {
	// contains filtered or unexported fields
}

SliceIterator represents the ListIterator interface for a slice of uint32

func NewSliceIterator

func NewSliceIterator(slice []uint32) *SliceIterator

NewSliceIterator returns a new instance of a slice iterator

func (*SliceIterator) Get

func (i *SliceIterator) Get() (uint32, error)

Get returns the current pointed element of the list

func (*SliceIterator) HasNext

func (i *SliceIterator) HasNext() bool

HasNext tells if the given iterator can be moved to the next record

func (*SliceIterator) IsValid

func (i *SliceIterator) IsValid() bool

IsValid returns true if the given iterator is dereferencable, otherwise returns false

func (*SliceIterator) Len

func (i *SliceIterator) Len() int

Len returns the actual size of the list

func (*SliceIterator) LowerBound

func (i *SliceIterator) LowerBound(to uint32) (uint32, error)

LowerBound moves the given iterator to the smallest record x in corresponding list such that x >= to

func (*SliceIterator) Next

func (i *SliceIterator) Next() (uint32, error)

Next moves the given iterator to the next record

func (*SliceIterator) Reset

func (i *SliceIterator) Reset(slice []uint32)

Reset performs reset operation with the provided slice

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL