Documentation
¶
Overview ¶
tokenizer can be used to split strings into lexical tokens.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func TokenizeFile ¶
TokenizeFile reads a file and returns its tokens. File contents are read line by line to handle large files efficiently. See TokenizeString for tokenization details.
func TokenizeString ¶
TokenizeString splits an input string into word tokens. It can handle natural language text as well as source code. Tokens containing non-ASCII characters are filtered out.
An optional filter function can be provided to include/exclude specific words before tokenization. It should return true to include the word.
Types ¶
Click to show internal directories.
Click to hide internal directories.