textutil

package
v0.0.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 14, 2026 License: MIT Imports: 4 Imported by: 0

Documentation

Overview

Package textutil provides text processing utilities for form classification.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Ngrams

func Ngrams(s string, minN, maxN int) []string

Ngrams returns min_n to max_n character-level n-grams of the given string.

func Normalize

func Normalize(text string) string

Normalize lowercases text and normalizes whitespace.

func NormalizeWhitespaces

func NormalizeWhitespaces(text string) string

NormalizeWhitespaces replaces newlines and multiple whitespace with a single space.

func NumberPattern

func NumberPattern(text string, ratio float64) string

NumberPattern replaces digits with X and letters with C if the digit ratio >= threshold. Returns empty string otherwise.

func TokenNgrams

func TokenNgrams(tokens []string, minN, maxN int) []string

TokenNgrams returns n-grams from a list of tokens, joined by space.

func Tokenize

func Tokenize(text string) []string

Tokenize extracts word tokens from text (Unicode-aware, matching Python's (?u)\b\w+\b).

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL