Documentation
¶
Index ¶
- func ApproxTokens(content string) int
- func CloneStrings(in []string) []string
- func CollapseWhitespace(value string) string
- func CompactWhitespace(value string, limit int, empty string) string
- func ContainsAllSubstringsFold(value string, markers []string) bool
- func ContainsAnySubstring(value string, markers []string) bool
- func ContainsAnySubstringFold(value string, markers []string) bool
- func ContainsEitherSubstring(a, b string) bool
- func ContainsEitherSubstringFold(a, b string) bool
- func ContainsEqualFoldTrimmed(values []string, want string) bool
- func ContainsTrimmed(values []string, want string) bool
- func CutAnyPrefixFold(value string, prefixes []string) (tail string, ok bool)
- func CutAroundAnySubstringFold(value string, markers []string) (before, after string, ok bool)
- func CutAroundAnySubstringFoldMatch(value string, markers []string) (before, marker, after string, ok bool)
- func CutBeforeAnySubstringFold(value string, markers ...string) (string, bool)
- func EqualFoldTrimmed(a, b string) bool
- func EqualTrimmed(a, b string) bool
- func FirstNonBlank(values ...string) string
- func FirstWords(content string, n int) string
- func FitsTokenBudget(used, cost, budget int, allowFirstOverBudget bool) bool
- func HasAnyPrefixFold(value string, prefixes ...string) bool
- func IsBlank(value string) bool
- func LowerTrimmed(value string) string
- func LowerTrimmedSet(values []string) map[string]struct{}
- func MatchesOptionalTrimmed(value, filter string) bool
- func MatchesOptionalTrimmedOrEmpty(value, filter string) bool
- func NonBlank(value string) bool
- func NormalizeUnique(values []string, normalize Normalizer, sortOutput bool) []string
- func Set(values []string, normalize Normalizer) map[string]struct{}
- func SortedSetValues(values map[string]struct{}, normalize Normalizer) []string
- func TrimQuestionPhraseBoundary(value string) string
- func TrimQuestionPunctuation(value string) string
- func TrimSentenceBoundary(value string) string
- func TrimSpaceAndQuotes(value string) string
- func TrimmedSet(values []string) map[string]struct{}
- func TruncateUTF8Bytes(value string, limit int) string
- func UniqueLowerTrimmed(values []string, sortOutput bool) []string
- func UniqueTrimmed(values []string, sortOutput bool) []string
- func UpperTrimmed(value string) string
- func WordCount(content string) int
- type Normalizer
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ApproxTokens ¶
ApproxTokens returns Goncho's stable, low-cost token estimate for budgeting. Blank content is treated as one token so callers never undercount an empty but present field.
func CloneStrings ¶
CloneStrings returns a shallow copy of a string slice.
func CollapseWhitespace ¶
CollapseWhitespace trims leading/trailing whitespace and converts any run of Unicode whitespace to a single ASCII space.
func CompactWhitespace ¶
CompactWhitespace collapses whitespace and limits the result to limit bytes, trimming a partial trailing word/space boundary the same way existing preview callers historically did.
func ContainsAllSubstringsFold ¶
ContainsAllSubstringsFold reports whether value contains every non-blank marker after trimming markers and applying the same simple case-fold policy used by Goncho text filters. Blank markers are ignored.
func ContainsAnySubstring ¶
ContainsAnySubstring reports whether value contains at least one marker.
func ContainsAnySubstringFold ¶
ContainsAnySubstringFold reports whether value contains at least one marker, comparing with the same simple case-fold policy used by Goncho text filters.
func ContainsEitherSubstring ¶
ContainsEitherSubstring reports whether either value contains the other.
func ContainsEitherSubstringFold ¶
ContainsEitherSubstringFold reports whether either value contains the other after applying the same simple case-fold policy used by Goncho text filters.
func ContainsEqualFoldTrimmed ¶
ContainsEqualFoldTrimmed reports whether values contains want after trimming ASCII/Unicode whitespace and applying Unicode case-folding.
func ContainsTrimmed ¶
ContainsTrimmed reports whether values contains want after trimming ASCII/ Unicode whitespace on both sides.
func CutAnyPrefixFold ¶
CutAnyPrefixFold removes the first matching prefix using the same simple case-fold policy as Goncho text classifiers. The returned tail preserves the original casing and spacing from value.
func CutAroundAnySubstringFold ¶
CutAroundAnySubstringFold splits value around the first matching marker using simple case-folding. The returned parts preserve the original casing and spacing from value.
func CutAroundAnySubstringFoldMatch ¶
func CutAroundAnySubstringFoldMatch(value string, markers []string) (before, marker, after string, ok bool)
CutAroundAnySubstringFoldMatch is like CutAroundAnySubstringFold and also returns the matching policy marker.
func CutBeforeAnySubstringFold ¶
CutBeforeAnySubstringFold returns value before the first matching marker, using case-insensitive matching. Empty markers are ignored.
func EqualFoldTrimmed ¶
EqualFoldTrimmed reports whether two strings are equal after trimming ASCII/ Unicode whitespace and applying Unicode case-folding.
func EqualTrimmed ¶
EqualTrimmed reports whether two strings are equal after trimming ASCII/ Unicode whitespace.
func FirstNonBlank ¶
func FirstWords ¶
FirstWords returns the first n whitespace-delimited words from content. When content has n or fewer words, it preserves the caller-visible trimmed text instead of rebuilding spacing between words.
func FitsTokenBudget ¶
FitsTokenBudget reports whether an item with cost can be added after used. When allowFirstOverBudget is true, the first item is admitted even when it exceeds the budget so callers can return at least one relevant result.
func HasAnyPrefixFold ¶
HasAnyPrefixFold reports whether value starts with any prefix, case-insensitively. Empty prefixes are ignored.
func LowerTrimmed ¶
LowerTrimmed trims surrounding whitespace and applies simple lower-casing.
func LowerTrimmedSet ¶
LowerTrimmedSet returns distinct non-empty strings after trimming and lower-casing.
func MatchesOptionalTrimmed ¶
MatchesOptionalTrimmed reports whether value satisfies an optional exact-match filter after trimming the filter. An empty filter matches every value.
func MatchesOptionalTrimmedOrEmpty ¶
MatchesOptionalTrimmedOrEmpty reports whether value satisfies an optional exact-match filter after trimming the filter, treating an empty value as legacy unscoped data that should not be excluded by the filter.
func NormalizeUnique ¶
func NormalizeUnique(values []string, normalize Normalizer, sortOutput bool) []string
NormalizeUnique returns non-empty normalized strings, preserving first-seen order unless sortOutput is true.
func Set ¶
func Set(values []string, normalize Normalizer) map[string]struct{}
Set returns normalized non-empty strings as a set. It preserves nil for empty input or when every normalized value is empty.
func SortedSetValues ¶
func SortedSetValues(values map[string]struct{}, normalize Normalizer) []string
SortedSetValues returns the sorted non-empty keys in values after optional normalization.
func TrimQuestionPhraseBoundary ¶
TrimQuestionPhraseBoundary removes question punctuation, dots, and spaces as boundary characters, matching classifiers that accept loosely spaced prompts.
func TrimQuestionPunctuation ¶
TrimQuestionPunctuation removes leading/trailing question punctuation before trimming whitespace, matching the policy used by fact-question classifiers.
func TrimSentenceBoundary ¶
TrimSentenceBoundary removes the sentence punctuation policy used by recall fact classifiers, then trims surrounding whitespace. It intentionally keeps punctuation inside the value unchanged.
func TrimSpaceAndQuotes ¶
TrimSpaceAndQuotes trims surrounding whitespace, then removes quote-like boundary characters used by fact extraction and prompt classifiers.
func TrimmedSet ¶
TrimmedSet returns distinct non-empty strings after whitespace trimming.
func TruncateUTF8Bytes ¶
TruncateUTF8Bytes returns value truncated to at most limit bytes without splitting a UTF-8 encoded rune.
func UniqueLowerTrimmed ¶
UniqueLowerTrimmed returns distinct non-empty strings after trimming and lower-casing.
func UniqueTrimmed ¶
UniqueTrimmed returns distinct non-empty strings after whitespace trimming.
func UpperTrimmed ¶
UpperTrimmed trims surrounding whitespace and converts the value to upper case.
Types ¶
type Normalizer ¶
type Normalizer = stringnorm.Normalizer
Normalizer is the shared contract for callers that canonicalize string values before set/unique operations. A nil Normalizer preserves values as-is.