Documentation
¶
Index ¶
- Constants
- func CosineSimilarity(f func(float32) bool) func(a, b []float32) bool
- func Dissimilar(a, b []float32) bool
- func HighSimilarity(a, b []float32) bool
- func MediumSimilarity(a, b []float32) bool
- func NewSentencer(eos string, r io.Reader) *bufio.Scanner
- func NewSlicer(delim string, r io.Reader) *bufio.Scanner
- func RangeSimilarity(lo, hi float32) func(a, b []float32) bool
- func WeakSimilarity(a, b []float32) bool
- type Chunker
- type Embedder
- type Identity
- type Scanner
- type Semantic
- type Sentencer
- type SimilarityWith
- type Slicer
- type Sorter
Constants ¶
const EndOfSentence = ".!?"
Default end of sentence
Variables ¶
This section is empty.
Functions ¶
func CosineSimilarity ¶ added in v0.0.2
Similarity with custom assert of cosine distance
func Dissimilar ¶ added in v0.0.2
Dissimilar is cosine distance (0.8, 1.0]. Typically, these items are unrelated, and you might filter them out unless dissimilarity is desirable (e.g., in anomaly detection).
func HighSimilarity ¶ added in v0.0.2
High Similarity is cosine distance [0, 0.2]. Use this range when you need very close matches (e.g., finding duplicate documents).
func MediumSimilarity ¶ added in v0.0.2
Medium Similarity is cosine distance (0.2, 0.5]. Useful when you want to find items that are related but not identical.
func NewSentencer ¶
Create a scanner that slices input stream by end of sentence
func RangeSimilarity ¶ added in v0.0.2
Similarity on custom cosine distance [lo, hi]. Use this range when you need custom interval.
func WeakSimilarity ¶ added in v0.0.2
Weak Similarity is cosine distance (0.5, 0.8]. This range could be used for exploratory results where you want to include some diversity.
Types ¶
type Chunker ¶
type Chunker struct {
Scanner
// contains filtered or unexported fields
}
func NewChunker ¶
type Identity ¶
func NewIdentity ¶
type Scanner ¶
Scanner is an interface similar to bufio.Scanner. It defines core functionality defined by this library.
type Semantic ¶ added in v0.0.2
type Semantic struct {
// contains filtered or unexported fields
}
Semantic provides a convenient solution for semantic chunking. Successive calls to the Semantic.Scan method will step through the context windows of a file and grouping sentences semantically. The context window is defined by number sentences, use Window method to change default 32 sentences value.
The specification of a sentence is defined by the Scanner interface, which is compatible with bufio.NewScanner. Use a Split function of type SplitFunc within bufio.NewScanner to control sentence breakdown.
The package provides NewSentencer utility that breaks the input into sentences using punctuation runes. Redefine Use Split function of bufio.NewScanner to define own algorithms.
The scanner uses embeddings to determine similarity. Use Similarity method to change the default high cosine similarity to own implementation. The module provides high, medium, weak and dissimilarity functions based on cosine distance.
Scanning stops unrecoverably at EOF or the first I/O error.
func NewSemantic ¶ added in v0.0.2
Creates new instance of Scanner to read from io.Reader and using embedding.
func (*Semantic) Scan ¶ added in v0.0.2
Scan advances the Semantic through context window, sequences will be available through Semantic.Text. It returns false if there was I/O error or EOF is reached.
func (*Semantic) Similarity ¶ added in v0.0.2
Similarity sets the similarity function for the Semantic. The default is HighSimilarity.
func (*Semantic) SimilarityWith ¶ added in v0.0.2
func (s *Semantic) SimilarityWith(x SimilarityWith)
Similarity sets the behavior to sorting algorithms.
Using SIMILARITY_WITH_HEAD configures algorithm to sort chunk similar to the first element of chunk. The first element of chunk is stable during the chunk forming.
Using SIMILARITY_WITH_TAIL configures algorithm to sort chunk similar to the last element of chunk. The last element is changed after new one is added to chunk.
type SimilarityWith ¶ added in v0.0.2
type SimilarityWith int
Configure similarity sorting algorithm
const ( SIMILARITY_WITH_HEAD SimilarityWith = iota SIMILARITY_WITH_TAIL )
Configure similarity sorting algorithm
type Sorter ¶ added in v0.0.2
type Sorter[T any] struct { // contains filtered or unexported fields }
Sorter provides a convenient solution for semantic sorting.
Successive calls to the Sorter.Sort method will step through the context windows of a slice and grouping 'sentences' semantically. The context window is defined either by number sentences, use Window method to change default 32 sentences value.
The input slice is assumed to be split into sentences already.
The sorter uses embeddings to determine similarity. Use Similarity method to change the default high cosine similarity to own implementation. The module provides high, medium, weak and dissimilarity functions based on cosine distance.
func NewSorter ¶ added in v0.0.2
Creates new instance of semantic Sorter, seq.Seq[T] is source of records.
func (*Sorter[T]) Next ¶ added in v0.0.2
Next advances the Sorter through context window, sequences will be available through [Scanner.Text]. It returns false if there was I/O error or EOF is reached.
func (*Sorter[T]) Similarity ¶ added in v0.0.2
Similarity sets the similarity function for the Sorter. The default is HighSimilarity.
func (*Sorter[T]) SimilarityWith ¶ added in v0.0.2
func (s *Sorter[T]) SimilarityWith(x SimilarityWith)
Similarity sets the behavior to sorting algorithms.
Using SIMILARITY_WITH_HEAD configures algorithm to sort chunk similar to the first element of chunk. The first element of chunk is stable during the chunk forming.
Using SIMILARITY_WITH_TAIL configures algorithm to sort chunk similar to the last element of chunk. The last element is changed after new one is added to chunk.