token

package

v0.1.6 Latest Latest Go to latest Published: Dec 2, 2024 License: MIT Imports: 0 Imported by: 5

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gnames/gner

Links

Open Source Insights

Documentation ¶

Overview ¶

Example ¶

package main

import (
	"fmt"

	"github.com/gnames/gner/ent/token"
)

func main() {
	text := "one\vtwo Poma-  \t\r\n tomus " +
		"dash -\nstandalone " +
		"Tora-\nBora\n\rthree 1778,\n"
	res := token.Tokenize([]rune(text), func(t token.TokenNER) token.TokenNER { return t })
	fmt.Println(res[0].Cleaned())
	fmt.Println(res[2].Cleaned())
}

Output:

one
Pomatomus

Index ¶

type TokenNER
- func Tokenize(text []rune, wrapToken func(TokenNER) TokenNER) []TokenNER

Examples ¶

Package

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type TokenNER ¶ added in v0.1.1

type TokenNER interface {
	// Raw is a verbatim presentation of a token as it appears in a text.
	Raw() []rune

	// Start is the index of the first rune of a token. The first rune
	// does not have to be alpha-numeric.
	Start() int

	// End is the index of the last rune of a token. The last rune does not
	// have to be alpha-numeric.
	End() int

	// Line line number in the text
	Line() int

	// SetLine sets the line number
	SetLine(int)

	// Cleaned is a presentation of a token after normalization.
	Cleaned() string

	// SetCleaned substitues existing cleaned text with a new one.
	SetCleaned(string)

	// ProcessToken computes a clean version of a name as well as properties
	// of the token.
	ProcessToken()
}

TokenNER represents a word separated by spaces in a text. Words split by new lines are concatenated.

func Tokenize ¶

func Tokenize(text []rune, wrapToken func(TokenNER) TokenNER) []TokenNER

Tokenize creates a slice containing tokens for every word in the document.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL