textutil

package
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 1, 2026 License: MIT Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ApproxTokens

func ApproxTokens(content string) int

ApproxTokens returns Goncho's stable, low-cost token estimate for budgeting. Blank content is treated as one token so callers never undercount an empty but present field.

func CloneStrings

func CloneStrings(in []string) []string

CloneStrings returns a shallow copy of a string slice.

func CollapseWhitespace

func CollapseWhitespace(value string) string

CollapseWhitespace trims leading/trailing whitespace and converts any run of Unicode whitespace to a single ASCII space.

func CompactWhitespace

func CompactWhitespace(value string, limit int, empty string) string

CompactWhitespace collapses whitespace and limits the result to limit bytes, trimming a partial trailing word/space boundary the same way existing preview callers historically did.

func ContainsAllSubstringsFold

func ContainsAllSubstringsFold(value string, markers []string) bool

ContainsAllSubstringsFold reports whether value contains every non-blank marker after trimming markers and applying the same simple case-fold policy used by Goncho text filters. Blank markers are ignored.

func ContainsAnySubstring

func ContainsAnySubstring(value string, markers []string) bool

ContainsAnySubstring reports whether value contains at least one marker.

func ContainsAnySubstringFold

func ContainsAnySubstringFold(value string, markers []string) bool

ContainsAnySubstringFold reports whether value contains at least one marker, comparing with the same simple case-fold policy used by Goncho text filters.

func ContainsEitherSubstring

func ContainsEitherSubstring(a, b string) bool

ContainsEitherSubstring reports whether either value contains the other.

func ContainsEitherSubstringFold

func ContainsEitherSubstringFold(a, b string) bool

ContainsEitherSubstringFold reports whether either value contains the other after applying the same simple case-fold policy used by Goncho text filters.

func ContainsEqualFoldTrimmed

func ContainsEqualFoldTrimmed(values []string, want string) bool

ContainsEqualFoldTrimmed reports whether values contains want after trimming ASCII/Unicode whitespace and applying Unicode case-folding.

func ContainsTrimmed

func ContainsTrimmed(values []string, want string) bool

ContainsTrimmed reports whether values contains want after trimming ASCII/ Unicode whitespace on both sides.

func CutAnyPrefixFold

func CutAnyPrefixFold(value string, prefixes []string) (tail string, ok bool)

CutAnyPrefixFold removes the first matching prefix using the same simple case-fold policy as Goncho text classifiers. The returned tail preserves the original casing and spacing from value.

func CutAroundAnySubstringFold

func CutAroundAnySubstringFold(value string, markers []string) (before, after string, ok bool)

CutAroundAnySubstringFold splits value around the first matching marker using simple case-folding. The returned parts preserve the original casing and spacing from value.

func CutAroundAnySubstringFoldMatch

func CutAroundAnySubstringFoldMatch(value string, markers []string) (before, marker, after string, ok bool)

CutAroundAnySubstringFoldMatch is like CutAroundAnySubstringFold and also returns the matching policy marker.

func CutBeforeAnySubstringFold

func CutBeforeAnySubstringFold(value string, markers ...string) (string, bool)

CutBeforeAnySubstringFold returns value before the first matching marker, using case-insensitive matching. Empty markers are ignored.

func EqualFoldTrimmed

func EqualFoldTrimmed(a, b string) bool

EqualFoldTrimmed reports whether two strings are equal after trimming ASCII/ Unicode whitespace and applying Unicode case-folding.

func EqualTrimmed

func EqualTrimmed(a, b string) bool

EqualTrimmed reports whether two strings are equal after trimming ASCII/ Unicode whitespace.

func FirstNonBlank

func FirstNonBlank(values ...string) string

func FirstWords

func FirstWords(content string, n int) string

FirstWords returns the first n whitespace-delimited words from content. When content has n or fewer words, it preserves the caller-visible trimmed text instead of rebuilding spacing between words.

func FitsTokenBudget

func FitsTokenBudget(used, cost, budget int, allowFirstOverBudget bool) bool

FitsTokenBudget reports whether an item with cost can be added after used. When allowFirstOverBudget is true, the first item is admitted even when it exceeds the budget so callers can return at least one relevant result.

func HasAnyPrefixFold

func HasAnyPrefixFold(value string, prefixes ...string) bool

HasAnyPrefixFold reports whether value starts with any prefix, case-insensitively. Empty prefixes are ignored.

func IsBlank

func IsBlank(value string) bool

IsBlank reports whether value is empty after trimming Unicode whitespace.

func LowerTrimmed

func LowerTrimmed(value string) string

LowerTrimmed trims surrounding whitespace and applies simple lower-casing.

func LowerTrimmedSet

func LowerTrimmedSet(values []string) map[string]struct{}

LowerTrimmedSet returns distinct non-empty strings after trimming and lower-casing.

func MatchesOptionalTrimmed

func MatchesOptionalTrimmed(value, filter string) bool

MatchesOptionalTrimmed reports whether value satisfies an optional exact-match filter after trimming the filter. An empty filter matches every value.

func MatchesOptionalTrimmedOrEmpty

func MatchesOptionalTrimmedOrEmpty(value, filter string) bool

MatchesOptionalTrimmedOrEmpty reports whether value satisfies an optional exact-match filter after trimming the filter, treating an empty value as legacy unscoped data that should not be excluded by the filter.

func NonBlank

func NonBlank(value string) bool

NonBlank reports whether value has non-whitespace content.

func NormalizeUnique

func NormalizeUnique(values []string, normalize Normalizer, sortOutput bool) []string

NormalizeUnique returns non-empty normalized strings, preserving first-seen order unless sortOutput is true.

func Set

func Set(values []string, normalize Normalizer) map[string]struct{}

Set returns normalized non-empty strings as a set. It preserves nil for empty input or when every normalized value is empty.

func SortedSetValues

func SortedSetValues(values map[string]struct{}, normalize Normalizer) []string

SortedSetValues returns the sorted non-empty keys in values after optional normalization.

func TrimQuestionPhraseBoundary

func TrimQuestionPhraseBoundary(value string) string

TrimQuestionPhraseBoundary removes question punctuation, dots, and spaces as boundary characters, matching classifiers that accept loosely spaced prompts.

func TrimQuestionPunctuation

func TrimQuestionPunctuation(value string) string

TrimQuestionPunctuation removes leading/trailing question punctuation before trimming whitespace, matching the policy used by fact-question classifiers.

func TrimSentenceBoundary

func TrimSentenceBoundary(value string) string

TrimSentenceBoundary removes the sentence punctuation policy used by recall fact classifiers, then trims surrounding whitespace. It intentionally keeps punctuation inside the value unchanged.

func TrimSpaceAndQuotes

func TrimSpaceAndQuotes(value string) string

TrimSpaceAndQuotes trims surrounding whitespace, then removes quote-like boundary characters used by fact extraction and prompt classifiers.

func TrimmedSet

func TrimmedSet(values []string) map[string]struct{}

TrimmedSet returns distinct non-empty strings after whitespace trimming.

func TruncateUTF8Bytes

func TruncateUTF8Bytes(value string, limit int) string

TruncateUTF8Bytes returns value truncated to at most limit bytes without splitting a UTF-8 encoded rune.

func UniqueLowerTrimmed

func UniqueLowerTrimmed(values []string, sortOutput bool) []string

UniqueLowerTrimmed returns distinct non-empty strings after trimming and lower-casing.

func UniqueTrimmed

func UniqueTrimmed(values []string, sortOutput bool) []string

UniqueTrimmed returns distinct non-empty strings after whitespace trimming.

func UpperTrimmed

func UpperTrimmed(value string) string

UpperTrimmed trims surrounding whitespace and converts the value to upper case.

func WordCount

func WordCount(content string) int

WordCount returns the number of whitespace-delimited words in content.

Types

type Normalizer

type Normalizer = stringnorm.Normalizer

Normalizer is the shared contract for callers that canonicalize string values before set/unique operations. A nil Normalizer preserves values as-is.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL