tokenizer

package

v0.1.1 Latest Latest Go to latest Published: Dec 30, 2025 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/vupdivup/typomat

Links

Open Source Insights

Documentation ¶

Overview ¶

tokenizer can be used to split strings into lexical tokens.

Index ¶

func TokenizeFile(path string, wordFilter func(string) bool) ([]string, error)
func TokenizeString(s string, wordFilter func(string) bool) []string
type Case

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func TokenizeFile ¶

func TokenizeFile(path string, wordFilter func(string) bool) ([]string, error)

TokenizeFile reads a file and returns its tokens. File contents are read line by line to handle large files efficiently. See TokenizeString for tokenization details.

func TokenizeString ¶

func TokenizeString(s string, wordFilter func(string) bool) []string

TokenizeString splits an input string into word tokens. It can handle natural language text as well as source code. Tokens containing non-ASCII characters are filtered out.

An optional filter function can be provided to include/exclude specific words before tokenization. It should return true to include the word.

Types ¶

type Case ¶

type Case int

const (
	CaseLower Case = iota
	CaseUpper
)

Source Files ¶

View all Source files

tokenizer.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL