line

package

v0.4.1 Latest Latest Go to latest Published: Feb 6, 2026 License: GPL-3.0 Imports: 7 Imported by: 7

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/stts-se/pronlex

Links

Open Source Insights

README ¶

File formats

Wikispeech lexicon file format

Description of the Wikispeech lexicon file format

NST lexicon file format

This format is used for converting NST lexicon files to the Wikispeech lexicon file format

Documentation ¶

Overview ¶

Package line is used to define lexicon line formats for parsing input and printing output.

Interfaces:

* Format - simple line format definition (field names and indices)

* Parser - a more complex parser, containing a Format definition, but also adds the possibility to write specific code for parsing that cannot be handeled by the Format specs alone (multi-value fields, etc).

THE WIKISPEECH FILE FORMAT ¶

The Wikispeech lexicon file format is defined in ws.go. Lexicon files are tab separated text files (UTF-8 encoded), and should contain the fields listed below. Empty fields are allowed in most positions.

Any lexicon files you want to import into the lexicon database must be in this file format.

Orth           The word's orthography
Pos            The part of speech tag
Morph          Morphological features (gender, number, etc)
WordParts      Compound parts, if any, separated by a plus sign (+)
Lemma          The word's lemma form
Paradigm       The name of the paradigm used for inflections
Lang           The word's language label
Trans1         The first transcription (default for TTS)
Translang1     The language of the Trans1
Trans2         Alternative transcription
Translang2     The language of the Trans2
Trans3         Alternative transcription
Translang3     The language of the Trans3
Trans4         Alternative transcription
Translang4     The language of the Trans4
StatusName     Status of the lexicon entry
StatusSource   Source of the status
Preferred      Takes values 1/0, and is used to defined which reading for a specific
               orthography should be the standard one (in case of homographs)
Tag            A tag (string) that can be used to disambiguate between homographs if needed (default: empty)
Comments       On or more comments containing a label (category), a comment (text), and a source (user or other source).
               Comments are defined in the following format (separated by §§§):
                 [label: comment text] (source) §§§ [anotherlabel: another comment] (anothersource_or_user)

Sample line:

finalspelet	NN	SIN|DEF|NOM|NEU	final+spelet	finalspel	s7n-övriga ex träd	sv-se	f I . "" n A: l . % s p e: . l e t	sv-se							imported	nst	false	dummytag	[assign_to: john] (jane) §§§
[nolabel: typo] (hanna)

Index ¶

func MapTranscriptions(m mapper.Mapper, e *lex.Entry) error
type Braxen
- func NewBraxen() (Braxen, error)
type Field
- func (i Field) String() string
type FileWriter
- func (w FileWriter) Size() int
- func (w FileWriter) Write(e lex.Entry) error
type Format
- func NewFormat(name string, fieldSep string, fields map[Field]int, nFields int, ...) (Format, error)
type FormatTest
type NST
- func NewNST() (NST, error)
type Parser
type WS
- func NewWS() (WS, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func MapTranscriptions ¶ added in v0.4.1

func MapTranscriptions(m mapper.Mapper, e *lex.Entry) error

MapTranscriptions maps the input entry's transcriptions (in-place)

Types ¶

type Braxen ¶ added in v0.4.1

type Braxen struct {
	// contains filtered or unexported fields
}

Braxen contains the line format used for Braxen lexicon data. Struct for package private usage. To create a new Braxen instance, use NewBraxen.

func NewBraxen ¶ added in v0.4.1

func NewBraxen() (Braxen, error)

NewBraxen is used to create an instance of the Braxen parser

func (Braxen) Entry2String ¶ added in v0.4.1

func (brax Braxen) Entry2String(e lex.Entry) (string, error)

Entry2String is used to generate an output line from a lex.Entry (calls underlying Format.String)

func (Braxen) Format ¶ added in v0.4.1

func (brax Braxen) Format() Format

Format is the line.Format instance used for line parsing inside of this parser

func (brax Braxen) Header() string

func (Braxen) Parse ¶ added in v0.4.1

func (brax Braxen) Parse(line string) (map[Field]string, error)

Parse is used for parsing input lines (calls underlying Format.Parse)

func (Braxen) ParseToEntry ¶ added in v0.4.1

func (brax Braxen) ParseToEntry(line string) (lex.Entry, string, error)

ParseToEntry is used for parsing input lines (calls underlying Format.Parse). Orthography will be lower cased, but 2nd return argument is the input orthography with its original case

func (Braxen) String ¶ added in v0.4.1

func (brax Braxen) String(fields map[Field]string) (string, error)

String is used to generate an output line from a set of fields (calls underlying Format.String)

type Field ¶

type Field int

Field is a simple const for line field definition types

const (
	// Orth orthography
	Orth Field = iota

	// Pos part-of-speech (noun, verb, NN, VB, etc)
	Pos

	// Morph morphological tags (case, gender, tense, etc)
	Morph

	// WordParts decompounded orthography field (for compounds)
	WordParts

	// Lang the word's language
	Lang

	// Trans1 the primary transcription
	Trans1

	// Translang1 the language of the primary transcription
	Translang1

	// Trans2 transcription variant
	Trans2

	// Translang2 language for Trans2
	Translang2

	// Trans3 transcription variant
	Trans3

	// Translang3 language for Trans3
	Translang3

	// Trans4 transcription variant
	Trans4

	// Translang4 language for Trans4
	Translang4

	// Trans5 transcription variant
	Trans5

	// Translang5 language for Trans5
	Translang5

	// Trans6 transcription variant
	Trans6

	// Translang6 language for Trans6
	Translang6

	// Lemma the lemma form. Ttypically orthographic lemmma + some kind of (disambiguation) identifier, eg., wind_01.
	Lemma

	// Paradigm rule reference (id) for generating inflected forms from lemma
	Paradigm

	// StatusName refers to a status category of the entry, such as 'ok', 'skip' or similar
	StatusName

	// StatusSource refers to the source of a status (user id, reference data id, etc)
	StatusSource

	// Preferred field to use label certain entries preferred over other ones with the same orthography; 1 = preferred, 0 = not preferred; Schema triggers only one preferred per orthographic word
	Preferred

	// Tag is an optional disambiguation tag
	Tag

	// Comments is an optional field
	Comments
)

func (Field) String ¶

func (i Field) String() string

type FileWriter ¶

type FileWriter struct {
	Parser Parser
	Writer io.Writer
	// contains filtered or unexported fields
}

FileWriter is used for writing entries to file (using an io.Writer)

func (FileWriter) Size ¶

func (w FileWriter) Size() int

Size returns the size of the FileWriter content

func (FileWriter) Write ¶

func (w FileWriter) Write(e lex.Entry) error

Write is used to write one lex.Entry at a time to a file (using an io.Writer)

type Format ¶

type Format struct {
	Name     string
	FieldSep string
	Fields   map[Field]int
	NFields  int
}

Format is used to define a lexicon's line. This a struct for package private usage. To create a new Format instance, use NewFormat.

func NewFormat ¶

func NewFormat(name string, fieldSep string, fields map[Field]int, nFields int, tests []FormatTest) (Format, error)

NewFormat is a public constructor for Format with built-in error checks and tests

func (Format) Equals ¶

func (f Format) Equals(other Format) bool

Equals compares two line.Format instances

func (f Format) Header() string

func (Format) Parse ¶

func (f Format) Parse(line string) (map[Field]string, error)

Parse is used for parsing input lines

func (Format) String ¶

func (f Format) String(fields map[Field]string) (string, error)

String is used to generate an output line from a set of fields

type FormatTest ¶

type FormatTest struct {
	InputLine  string
	Fields     map[Field]string
	OutputLine string
}

FormatTest defines a test to run upon initialization of Format (using NewFormat)

type NST ¶

type NST struct {
	// contains filtered or unexported fields
}

NST contains the line format used for NST lexicon data. Struct for package private usage. To create a new NST instance, use NewNST.

func NewNST ¶

func NewNST() (NST, error)

NewNST is used to create an instance of the NST parser

func (NST) Entry2String ¶

func (nst NST) Entry2String(e lex.Entry) (string, error)

Entry2String is used to generate an output line from a lex.Entry (calls underlying Format.String)

func (NST) Format ¶

func (nst NST) Format() Format

Format is the line.Format instance used for line parsing inside of this parser

func (nst NST) Header() string

func (NST) Parse ¶

func (nst NST) Parse(line string) (map[Field]string, error)

Parse is used for parsing input lines (calls underlying Format.Parse)

func (NST) ParseToEntry ¶

func (nst NST) ParseToEntry(line string) (lex.Entry, string, error)

ParseToEntry is used for parsing input lines (calls underlying Format.Parse). Orthography will be lower cased, but 2nd return argument is the input orthography with its original case

func (NST) String ¶

func (nst NST) String(fields map[Field]string) (string, error)

String is used to generate an output line from a set of fields (calls underlying Format.String)

type Parser ¶

type Parser interface {

	// Format is the line.Format instance used for line parsing inside of this parser
	Format() Format

	// Parse is used for parsing input lines
	Parse(string) (map[Field]string, error)

	// String is used to generate an output line from a set of fields
	String(map[Field]string) (string, error)

	// Entry2String is used to generate an output line from an input entry
	Entry2String(e lex.Entry) (string, error)
}

Parser is used to define a lexicon's line parser. To implement your own parser, make sure to implement functions Parse(string) and String(map[Field]string)

type WS ¶

type WS struct {
	// contains filtered or unexported fields
}

WS implements the line.Parser interface

func NewWS ¶

func NewWS() (WS, error)

NewWS is used to create a new instance of the WS parser

func (WS) Entry2String ¶

func (ws WS) Entry2String(e lex.Entry) (string, error)

Entry2String is used to generate an output line from a lex.Entry (calls underlying Format.String)

func (WS) Format ¶

func (ws WS) Format() Format

Format is the line.Format instance used for line parsing inside of this parser

func (ws WS) Header() string

func (WS) Parse ¶

func (ws WS) Parse(line string) (map[Field]string, error)

Parse is used for parsing input lines (calls underlying Format.Parse)

func (WS) ParseToEntry ¶

func (ws WS) ParseToEntry(line string) (lex.Entry, error)

ParseToEntry is used for parsing input lines (calls underlying Format.Parse)

func (WS) String ¶

func (ws WS) String(fields map[Field]string) (string, error)

String is used to generate an output line from a set of fields (calls underlying Format.String)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

File formats

Wikispeech lexicon file format

NST lexicon file format

Documentation ¶

Overview ¶

THE WIKISPEECH FILE FORMAT ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func MapTranscriptions ¶ added in v0.4.1

Types ¶

type Braxen ¶ added in v0.4.1

func NewBraxen ¶ added in v0.4.1

func (Braxen) Entry2String ¶ added in v0.4.1

func (Braxen) Format ¶ added in v0.4.1

func (Braxen) Header ¶ added in v0.4.1

func (Braxen) Parse ¶ added in v0.4.1

func (Braxen) ParseToEntry ¶ added in v0.4.1

func (Braxen) String ¶ added in v0.4.1

type Field ¶

func (Field) String ¶

type FileWriter ¶

func (FileWriter) Size ¶

func (FileWriter) Write ¶

type Format ¶

func NewFormat ¶

func (Format) Equals ¶

func (Format) Header ¶ added in v0.4.1

func (Format) Parse ¶

func (Format) String ¶

type FormatTest ¶

type NST ¶

func NewNST ¶

func (NST) Entry2String ¶

func (NST) Format ¶

func (NST) Header ¶ added in v0.4.1

func (NST) Parse ¶

func (NST) ParseToEntry ¶

func (NST) String ¶

type Parser ¶

type WS ¶

func NewWS ¶

func (WS) Entry2String ¶

func (WS) Format ¶

func (WS) Header ¶ added in v0.4.1

func (WS) Parse ¶

func (WS) ParseToEntry ¶

func (WS) String ¶

Source Files ¶