Documentation
¶
Overview ¶
Package merger provides a different set of algorithms for solving T-overlap occurrence problem of sorted lists of integers. T-occurrence problem can be described next: - Find the set of string ids that appear at least T times on the inverted lists, where T is a constant.
Index ¶
- Constants
- Variables
- type Collector
- type ListIntersector
- type ListIterator
- type ListMerger
- type MergeCandidate
- type Rid
- type SimpleCollector
- type SliceIterator
- func (i *SliceIterator) Get() (uint32, error)
- func (i *SliceIterator) HasNext() bool
- func (i *SliceIterator) IsValid() bool
- func (i *SliceIterator) Len() int
- func (i *SliceIterator) LowerBound(to uint32) (uint32, error)
- func (i *SliceIterator) Next() (uint32, error)
- func (i *SliceIterator) Reset(slice []uint32)
Constants ¶
const MaxOverlap = 0xFFFF
MaxOverlap is the largest value of an overlap count for a merge candidate
Variables ¶
var ErrCollectionTerminated = errors.New("collection terminated")
ErrCollectionTerminated tells to terminate collection of the current workflow This error is going to be processed by Intersector/Merger and not to be propagated up
var ( // ErrIteratorIsNotDereferencable tells than iterator has an invalid state ErrIteratorIsNotDereferencable = errors.New("iterator is not dereferencable") )
Functions ¶
This section is empty.
Types ¶
type Collector ¶
type Collector interface {
// Collect collects the given candidate
Collect(candidate MergeCandidate) error
}
Collector collects the doc stream satisfied to a search criteria
type ListIntersector ¶
type ListIntersector interface {
// Intersect performs intersection operation for the given rid and
// transmits the result to collector
Intersect(rid Rid, collector Collector) error
}
ListIntersector is the interface that is responsible for intersection operation between array of docs iterators
func Intersector ¶
func Intersector() ListIntersector
Intersector creates a new instance of ListIntersector
type ListIterator ¶
type ListIterator interface {
// Get returns the current pointed element of the list
Get() (uint32, error)
// HasNext tells if the given iterator can be moved to the next record
HasNext() bool
// Next moves the given iterator to the next record
Next() (uint32, error)
// LowerBound moves the given iterator to the smallest record x
// in corresponding list such that x >= to
LowerBound(to uint32) (uint32, error)
// Len returns the actual size of the list
Len() int
}
ListIterator is the interface of a posting list iterator
type ListMerger ¶
type ListMerger interface {
// Merge returns list of candidates, that appears at least `threshold` times.
Merge(rid Rid, threshold int, collector Collector) error
}
ListMerger solves `threshold`-occurrence problem: For given inverted lists find the set of strings ids, that appears at least `threshold` times.
func CPMerge ¶
func CPMerge() ListMerger
CPMerge was described in paper "Simple and Efficient Algorithm for Approximate Dictionary Matching" inspired by https://github.com/chokkan/simstring
func DivideSkip ¶
func DivideSkip(mu float64) ListMerger
DivideSkip was described in paper "Efficient Merging and Filtering Algorithms for Approximate String Searches" We have to choose `good` parameter mu, for improving speed. So, mu depends only on given dictionary, so we can find it
func MergeSkip ¶
func MergeSkip() ListMerger
MergeSkip was described in paper "Efficient Merging and Filtering Algorithms for Approximate String Searches" Formally, main idea is to skip on the lists those record ids that cannot be in the answer to the query, by utilizing the threshold
func ScanCount ¶
func ScanCount() ListMerger
ScanCount scan the N inverted lists one by one. For each string id on each list, we increment the count corresponding to the string by 1. We report the string ids that appear at least `threshold` times on the lists.
type MergeCandidate ¶
type MergeCandidate uint64
MergeCandidate is result of merging Rid
func NewMergeCandidate ¶
func NewMergeCandidate(position, overlap uint32) MergeCandidate
NewMergeCandidate creates a new instance of MergeCandidate
func (MergeCandidate) Overlap ¶
func (m MergeCandidate) Overlap() int
Overlap returns the current overlap count of the candidate
func (MergeCandidate) Position ¶
func (m MergeCandidate) Position() uint32
Position returns the given position of the candidate
type Rid ¶
type Rid []ListIterator
Rid represents inverted lists for ListMerger
type SimpleCollector ¶
type SimpleCollector struct {
Candidates []MergeCandidate
}
SimpleCollector a dummy implementation of Collector
func (*SimpleCollector) Collect ¶
func (c *SimpleCollector) Collect(candidate MergeCandidate) error
Collect collects the given candidate
type SliceIterator ¶
type SliceIterator struct {
// contains filtered or unexported fields
}
SliceIterator represents the ListIterator interface for a slice of uint32
func NewSliceIterator ¶
func NewSliceIterator(slice []uint32) *SliceIterator
NewSliceIterator returns a new instance of a slice iterator
func (*SliceIterator) Get ¶
func (i *SliceIterator) Get() (uint32, error)
Get returns the current pointed element of the list
func (*SliceIterator) HasNext ¶
func (i *SliceIterator) HasNext() bool
HasNext tells if the given iterator can be moved to the next record
func (*SliceIterator) IsValid ¶
func (i *SliceIterator) IsValid() bool
IsValid returns true if the given iterator is dereferencable, otherwise returns false
func (*SliceIterator) Len ¶
func (i *SliceIterator) Len() int
Len returns the actual size of the list
func (*SliceIterator) LowerBound ¶
func (i *SliceIterator) LowerBound(to uint32) (uint32, error)
LowerBound moves the given iterator to the smallest record x in corresponding list such that x >= to
func (*SliceIterator) Next ¶
func (i *SliceIterator) Next() (uint32, error)
Next moves the given iterator to the next record
func (*SliceIterator) Reset ¶
func (i *SliceIterator) Reset(slice []uint32)
Reset performs reset operation with the provided slice