tokenizer

package
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 30, 2025 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

tokenizer can be used to split strings into lexical tokens.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func TokenizeFile

func TokenizeFile(path string, wordFilter func(string) bool) ([]string, error)

TokenizeFile reads a file and returns its tokens. File contents are read line by line to handle large files efficiently. See TokenizeString for tokenization details.

func TokenizeString

func TokenizeString(s string, wordFilter func(string) bool) []string

TokenizeString splits an input string into word tokens. It can handle natural language text as well as source code. Tokens containing non-ASCII characters are filtered out.

An optional filter function can be provided to include/exclude specific words before tokenization. It should return true to include the word.

Types

type Case

type Case int
const (
	CaseLower Case = iota
	CaseUpper
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL