Documentation
¶
Index ¶
- func BestPairCombinationJaroWinkler(searchTokens []string, indexedTokens []string) float64
- func BestPairCombinationJaroWinklerWeighted(searchTokens []string, indexedTokens []string, searchWeights []float64, ...) float64
- func BestPairsJaroWinkler(searchTokens []string, indexedTokens []string) float64
- func BestPairsJaroWinklerWeighted(searchTokens []string, indexedTokens []string, searchWeights []float64, ...) float64
- func GenerateWordCombinations(tokens []string) [][]string
- func JaroWinkler(s1, s2 string) float64
- func JaroWinklerWithFavoritism(indexedTerm, query string, favoritism float64) float64
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BestPairCombinationJaroWinkler ¶ added in v0.51.0
BestPairCombinationJaroWinkler compares a search query to an indexed term with improved handling of short words and spacing variations
func BestPairCombinationJaroWinklerWeighted ¶ added in v0.57.0
func BestPairCombinationJaroWinklerWeighted(searchTokens []string, indexedTokens []string, searchWeights []float64, indexWeights []float64) float64
BestPairCombinationJaroWinklerWeighted is like BestPairCombinationJaroWinkler but uses TF-IDF weights.
func BestPairsJaroWinkler ¶
BestPairsJaroWinkler compares a search query to an indexed term (name, address, etc) and returns a decimal fraction score.
The algorithm splits each string into tokens, and does a pairwise Jaro-Winkler score of all token combinations (outer product). The best match for each search token is chosen, such that each index token can be matched at most once.
The pairwise scores are combined into an average in a way that corrects for character length, and the fraction of the indexed term that didn't match.
func BestPairsJaroWinklerWeighted ¶ added in v0.57.0
func BestPairsJaroWinklerWeighted(searchTokens []string, indexedTokens []string, searchWeights []float64, indexWeights []float64) float64
BestPairsJaroWinklerWeighted compares a search query to an indexed term using TF-IDF weights. The algorithm is similar to BestPairsJaroWinkler but uses TF-IDF weights instead of character length to weight the importance of each matched term pair.
searchWeights and indexWeights should have the same length as their corresponding token slices. If weights are nil or have different lengths, falls back to unweighted scoring.
func GenerateWordCombinations ¶ added in v0.51.0
GenerateWordCombinations creates variations of the input words by combining short words with their neighbors, to handle cases like "JSC ARGUMENT" vs "JSCARGUMENT"
func JaroWinkler ¶
jaroWinkler runs the similarly named algorithm over the two input strings and averages their match percentages according to the second string (assumed to be the user's query)
Terms are compared between a few adjacent terms and accumulate the highest near-neighbor match.
For more details see https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
Types ¶
This section is empty.